0% found this document useful (0 votes)
67 views11 pages

Advanced Engineering Informatics: Sciencedirect

zhang 2020

Uploaded by

kika bouba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views11 pages

Advanced Engineering Informatics: Sciencedirect

zhang 2020

Uploaded by

kika bouba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Advanced Engineering Informatics 45 (2020) 101097

Contents lists available at ScienceDirect

Advanced Engineering Informatics


journal homepage: www.elsevier.com/locate/aei

Reinforcement learning based optimizer for improvement of predicting T


tunneling-induced ground responses
Pin Zhanga, Heng Lib, Q.P. Hac, Zhen-Yu Yina, Ren-Peng Chend,e,

a
Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
b
Department of Building and Real Estate, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
c
Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia
d
College of Civil Engineering, Hunan University, Changsha 410082, China
e
MOE Key Laboratory of Building Safety and Energy Efficiency, Changsha 410082, China

ARTICLE INFO ABSTRACT

Keywords: Prediction of ground responses is important for improving performance of tunneling. This study proposes a novel
Tunnel reinforcement learning (RL) based optimizer with the integration of deep-Q network (DQN) and particle swarm
Ground response optimization (PSO). Such optimizer is used to improve the extreme learning machine (ELM) based tunneling-
Reinforcement learning induced settlement prediction model. Herein, DQN-PSO optimizer is used to optimize the weights and biases of
Extreme learning machine
ELM. Based on the prescribed states, actions, rewards, rules and objective functions, DQN-PSO optimizer
Optimization
evaluates the rewards of actions at each step, thereby guides particles which action should be conducted and
when should take this action. Such hybrid model is applied in a practical tunnel project. Regarding the search of
global best weights and biases of ELM, the results indicate the DQN-PSO optimizer obviously outperforms
conventional metaheuristic optimization algorithms with higher accuracy and lower computational cost.
Meanwhile, this model can identify relationships among influential factors and ground responses through self-
practicing. The ultimate model can be expressed with an explicit formulation and used to predict tunneling-
induced ground response in real time, facilitating its application in engineering practice.

1. Introduction cause large ground subsidence. Kinematical effects of tunneling on the


ground response have also be frequently reported [6,7], but these
Ground responses to shield machine tunneling is a sophisticated phenomenological observations merely stated the effect of ground re-
problem that is affected by tunnel geometry, shield machine opera- sponses to these kinematical parameters. It means that an explicit
tional parameters, geological conditions and anomalous conditions [1]. model involving these parameters cannot be developed.
The development of a rigorous analytical solution for descripting tun- Existing analytical solutions merely account for limited influential
neling-induced ground response is complicated, because tunneling factors and simulate ground responses in a simplistic manner [8]. En-
process involves multi-disciplinary knowledge such as solid mechanics, gineers prefer to apply empirical formulations, which were derived
fluid mechanics, thermodynamics and tribology. The initial analytical from numerous in-situ observations, to predict ground response due to
models were developed upon the homogenous elastic half-space theory their simplicity [9]. However, such phenomenological methods tend to
[2,3], in which soils are treated as an isotropic elastic material with a be applicable to a specific engineering, because the influential factors
single layer. To consider the tunneling-induced plastic deformation, such as soil types, construction methods and tunnel configuration are
classical plasticity solutions for soil stresses and displacements [4] were different for different tunneling projects. Numerical modelling methods
obtained by assuming a cavity contraction in a linearly-elastic, plastic such as finite element and discrete element have been extensively
material with Mohr-Coulomb yielding and nonassociative flow, but this employed to investigate ground response to tunneling as the improve-
method is merely appropriate in the case that plastic zone does not ment in the software and hardware [10–12]. Such elaborate numerical
extend to the ground surface [5]. Few analytical models can consider models are able to simulate soil-shield machine interaction by con-
the effect of fluid-solid coupling on the ground responses, although sidering numerous extrinsic and intrinsic factors such as the geological
tunneling process can cause remarkable change in flow regime and heterogeneity [11] and the shield machine operation [13].


Corresponding author at: College of Civil Engineering, Hunan University, Changsha 410082, China.
E-mail address: [email protected] (R.-P. Chen).

https://doi.org/10.1016/j.aei.2020.101097
Received 18 September 2019; Received in revised form 27 March 2020; Accepted 31 March 2020
1474-0346/ © 2020 Elsevier Ltd. All rights reserved.
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

Nevertheless, parameters of soil constitutive models need to be cali- explicitly programmed. ML algorithms have made a significant break-
brated by numerous experimental tests and back analysis of parameters through with appreciable performance in many domains. They have
also requires considerable skills [14]. A problem that all engineers have been considered to be the best choice for discovering the intricate re-
to confront is how to timely predict ground responses to tunneling and lationships among high-dimensional data [31]. Ground responses to
mitigates potential risks. Considering the influential factors such as tunneling is complicated with the coupled effects of intrinsic and ex-
operational parameters and geological conditions vary frequently with trinsic factors such as geological, geotechnical, geometric, shield op-
the advance of shield machine, empirical, analytical and numerical erational and anomalous parameters, which brings huge difficulties to
methods obviously exist their own deficiency in capturing the ground accurately predict tunneling-induced settlement. Moreover, tunneling is
responses to tunneling in real time. a dynamic process and its influential factors always change with the
To predict ground responses alone the whole tunnel alignment, advance of shield machine, thereby the real time prediction of settle-
machine learning (ML)-based surrogate models have recently been ment is vitally important in engineering practice. Conventional em-
proposed to complement the deficiency of conventional methods. Such pirical, analytical, numerical and physical modelling methods have
models have strong capability of identifying the nonlinear relationship their limitations and cannot predict soil-shield machine interaction in
between ground responses and various influential factors [15–17]. real time. ML algorithms provide a novel method to overcome this
Prediction models are established offline by directly learning from the issue.
in-situ data and used to online prediction of ground response in real Since the first application of ANN to predict tunneling-induced
time with high accuracy. The current ML-based tunneling-induced settlement conducted by Shi et al. [32], various ML algorithms have
ground response prediction models were developed upon quite limited been extensively used to predict soil-shield machine interaction in the
datasets (within 1000 datasets), thereby the model architecture is not last two decades. The most widely used ML algorithm is the ANN with
sophisticated (within 20 input variables). Researchers thus utilized the error backpropagation [33–39]. Meanwhile its variants such as
metaheuristic optimization algorithms such as particle swarm optimi- general regression neural network [40], wavelet neural network [14]
zation (PSO) to search the hyper-parameters and general parameters of and radial basis function neural network [40] have gained popularity in
these ML-based models [18]. PSO has been successfully used in many predicting soil-shield machine interaction. In the last decade, the de-
domains [19], but the original PSO primarily exists two issues: pre- velopment of ML has experienced a course of blossom. Consequently,
mature convergence and high computational cost. The premature researchers have implemented various ML algorithms to predict tun-
convergence means that PSO tends to be trapped in the local optima at neling-induced ground settlement such as extreme learning machine
the beginning of the search process. Meanwhile the computational cost [41], adaptive neuro fuzzy inference system [42,43], relevance vector
can increase dramatically with the increasing population size, although machine [44], least-squares support-vector machine [45], random
the diversity of swarm is beneficial to obtain global optima. To mitigate forest [46–48] and genetic expression programming [42].
these issues, numerous researchers have preoccupied with enhancing The key of developing a ML-based settlement prediction model is to
PSO algorithm, such as modified PSO with adaptive parameters determine the values of hyper-parameters. In addition, the weights and
[20,21], hybrid PSO [22,23]. Nevertheless, which action should be biases also need to be determined for ANN and its variants. The com-
chosen for particles effectively moving towards the best position and monly used methods for determining hyper-parameters involve trial
when should take this action are still a key challenge. and error, grid search and meta-heuristic algorithms [34,39,40]. The
In this study, a more general-purpose PSO optimizer enhanced by weights and biases of ANN-based models are generally determined
reinforcement learning (RL) deep Q-network (DQN) is proposed. In the using deterministic and stochastic optimization algorithms [18,35].
past several years, RL has driven impressive advances in artificial in- Trial and error and grid search methods can only search the parameters
telligence and rapidly extended their application scopes [24–27]. In in a limited space. Deterministic optimization algorithm such as gra-
particular, the models trained by DQN outperform human experts in dient descend may be trapped into local optima. Stochastic algorithms
Atari, Go and no-limit poker [28–30]. The most fundamental im- suffer from premature convergence and high computational cost. The
provement is that deep RL algorithm does not rely on hand-crafted global best parameters are thus hard to be obtained by using such
policy evaluation functions, compared with previous ML algorithms. method. To this end, this study proposes a RL algorithm DQN based
The agent of deep RL interacts with environment and learn past ex- optimizer to search the global optimum parameters of ML algorithms
perience like a human via self-playing, thereby continuously improve with higher accuracy and lower computational cost.
their performance. This success motivates us to propose a DQN-based
PSO optimizer (DQN-PSO), in which agent guides particles to choose
2.2. Reinforcement learning
the optimum action at each generation and move towards the best
position with the lowest computational cost. To the best knowledge of
2.2.1. Framework of reinforcement learning
the authors, this is first work to combine RL algorithm DQN based
Reinforcement learning (RL) is originated from a discrete-time and
optimizer to develop a global best ML based model for investigating
finite Markov decision process (MDP). RL consists of a learning agent, an
ground responses to tunneling.
environment, states, actions, and rewards. The agent interacts with an
Hence, this study aims to develop an ELM-based ground response
environment at some discrete time scale, t = 0, 1, …. On each time step
prediction model due to its fast calculation speed. The proposed DQN-
t, the agent perceives or observes the state of the environment, St (St
PSO optimizer is used to optimize ELM for identifying the global best
S), thereafter chooses a primitive action based on this perception or
weights and biases. A case study is implemented for validating the
observation, At (At ASt). In response to each action, a, the environ-
prediction performance of the proposed hybrid model. The framework
ment thereafter produces a numerical reward, Rt+1, and changes to a
of hybrid ELM and DQN-PSO optimizer proposed in this study can be
next state, St+1 (St+1 S). The whole dynamic transition process can be
replaced by various ML and metaheuristic algorithms to explore various
mathematically expressed by:
issues.
a
ss = Pr {St + 1 = s |St = s , At = a} (1)
2. Literature review and methodology
Rsa = E {Rt + 1 |Rt = s , Rt = a} (2)
2.1. Literature review
where ass = state transition probability matrix; Rsa = immediate re-
Machine learning (ML) is a subsection of artificial intelligence that ward.
imparts the system to automatically learn from the data without being Note that the action at a state is selected based on a policy, π.

2
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

(a |s ) = P {At = a| St = s} (3)

Therefore, the objective of the learning agent is to learn a policy


which maximizes the expected discounted future reward at each state
after maps from states to probabilities of taking each available primitive
action, as shown by:

V (s ) = E {Rt + 1 + Rt + 1 + 2Rt + 2 + …| St = s }
= E {Rt + 1 + V (St + 1 )| St = s }

= (s, a) Rsa + Pssa V (s )


a A s (4)

where γ (0, 1) = a discount factor, it denotes the reward from next


states gradually decreases; v = state-value function under policy, π;
v (s ) = value of the state, s, under policy, π.
The state-value function is the expected return starting from state, s,
and then following policy, π. There is another value function that is the
expected return starting from state, s, taking action a, and then fol-
lowing policy, π, which is termed as action-value function, as shown
following:

Q (s, a) = E {Rt + 1 + Rt + 1 + 2R + …| St = s, At = a}
t+2
= Rsa + Pssa V (s )
s S
= Rsa + Pssa (a |s ) Q (s , a )
s S a A (5)
The objective of value-based RL algorithms such as Q-learning and
Sarsa is to determine the optimum state-value V*(s) or action-value
functions Q*(s, a), as shown in Eqs. (6) and (7). This study also utilizes
value-based RL algorithm to establish model.

V (s ) = Rsa + Pssa V (s )
a A s (6)

Q (s, a) = Rsa + Pssa max Q (s , a )


s S
a A (7)

2.2.2. Deep Q network


Fig. 1. Schematic view of reinforcement learning: (a) Q-learning; (b) deep Q-
In this study, a deep RL algorithm termed as DQN proposed by Mnih
network.
et al. [29] is used. Conventional RL algorithms generally utilize a Q
table to store states and actions. The values in the Q table update
continuously complying with Eq. (8) during the learning process [49] interval. In this way, training can avoid falling into feedback loops and
(see Fig. 1(a)), thereby they have thus been limited to certain condi- proceed in a more stable manner.
tions with finite and discrete states and actions.
Q (s, a) = Q (s , a) + [Rsa + max Q (s , a ) Q (s, a)]
a (8) 2.3. Particle swarm optimization

where α = learning rate.


Particle swarm optimization (PSO) is a metaheuristic optimization
A DQN-based agent can interact with an environment with con-
algorithm [51], which is developed based on simulating search beha-
tinuous states, because a DNN can parameterize an approximate action-
vior and social interaction of animals such as fish school and bird flock.
value function Q(s, a; θi) (see Fig. 1(b)). Nevertheless, the sequence of
PSO consists of several populations of particles and each particle is
observations lead to a strong correlation among these observations,
represented by its position vector Xk i, velocity vector Vk i and fitness
thereby the neural network-based action-value function may be un-
value. The velocity and position of each particle are updated using the
stable and even diverge [50], thereby an experience replay method is
following equations:
proposed [29]. In this method, agent’s experience et = (St, At, Rt, St+1)
at the time step t is stored in a replay memory pool D. DNN can be Vik + 1 = Vik + c1 r1 (pBestik Xik ) + c2 r2 (gBest k Xik )
trained based on mini-batches (s, a, r, s') ∼ U(D) that are randomly
drawn from the memory pool, which is beneficial to eliminate strong (10)
correlations among observations and ensures that the learning system is
stable. The corresponding loss function of DNN is: Xik + 1 = Xik + Vik + 1 (11)
a
Li ( i ) = (s, a, r , s ) U (D) [Rs + max Q (s , a ; ) Q (s , a ; i )]
a
i
(9) where k = current generation, i = ith particle; ω = inertia weight; c1,
c2 = cognitive and social acceleration coefficients; r1, r2 = random
where θi = parameters of Q-DNN at the ith iteration; i = parameters
numbers within the range [0, 1] complying with uniform distribution;
of target DNN at the ith iteration. Note that only the parameters of Q-
pBesti = local best location of the ith particle; gBest = global best lo-
DNN are updated in real time. Target DNN is a forward network and has
cation of all particles. The predominant objective of the PSO algorithm
a same architecture with Q-DNN. The update of parameters i is
is to find the optimum fitness value and the corresponding location.
achieved by directly extracting parameters from Q-DNN at a fixed

3
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

2.4. Extreme learning machine input variables as mentioned above and ground maximum settlement S
is the only output variable. ELM based model with different number of
Extreme learning machine (ELM) is a modification of the single- hidden neurons was pre-trained for selecting an appropriate frame-
hidden layer feedforward neural network, that is, only one hidden layer work. Considering the focus to this study is to highlight the superiority
in this algorithm. The weights of input layer and the biases of hidden of proposed optimizer in the next section, the detailed processing for
layer are assigned randomly. The optimum ELM is obtained by calcu- determine the optimum number of hidden neurons are not presented
lating the weights of the hidden layer and the biases of the output layer, for brevity. The results indicate the performance of model is not sen-
thereby the calculation speed is much faster. In general, ELM can be sitive to the hyper-parameters (the number of hidden neurons) when
represented as: the number of hidden neurons exceed 15. Considering the computa-
tional cost and the model performance, 20 hidden neurons are ulti-
H = f (Xw+b) (12)
mately adopted in this study. The training of ELM-based model can be
H y =0 (13) obtained by:

where w = weight matrix of the input layer; b = bias vector of the H = fE (Xw+b); = H†y (14)
hidden layer; H = hidden layer output matrix; β = weight vector con- where X = input matrix (n × 12, n is the number of datasets);
necting the hidden nodes and the output nodes; y = outputs; f = acti- H = output of hidden layer (n × 20); y = output vector (n × 1);
vation function. The optimum ELM algorithm can be achieved by w = weights matrix (12 × 20); b = bias vector (1 × 20); H† is obtained
minimizing the value of H y . Detailed description of ELM algorithm by Moore–Penrose generalized inverse of matrix H [53] (20 × n), because
can refer to Huang et al. [52]. H is a nonsquare matrix; β = ultimate training result (20 × 1); fE is an
activation function used in the hidden layer of ELM, and sigmoid acti-
3. Introduction of proposed model vation function is adopted in this study, which can be expressed by:
1
3.1. Proposed ELM-based ground response prediction model fE (x ) =
1+e x (15)

3.1.1. Feature selection


Recent work by Zhang et al. [48] demonstrated that the influential 3.2. Proposed deep reinforcement learning-based optimizer
factors of tunneling-induced settlement can be mainly classified into
four categories: tunnel geometry, geological condition, shield opera- 3.2.1. States and actions
tional parameters and anomalous conditions. In detail, twelve para- The novel optimizer is developed based on the integration of deep
meters are vitally important to soil-tunnel interaction including one reinforcement learning algorithm DQN and meta-heuristic optimization
tunnel geometry factor (cover depth of tunnel C, it should be noted that algorithm PSO (DQN-PSO). The search space of population represents
cover depth of tunnel is the only geometric factor used in this study the environment of DQN, and positions of all particles represent the
considering the tunnel specification along the whole section is con- state of DQN. Three actions, i.e., exploration, exploitation and jump are
sistent), five shield operational parameters (thrust Th, torque To, grout considered in this study, as shown following:
filling Gf, penetration rate Pr, chamber pressure Cp), five geological
parameters (modified blow counts of standard penetration test of soil (i) Exploration: in PSO, ω, c1 and c2 control the movement direction
layers MSPT, modified blow counts of dynamic penetration test of soil and scale of particles. At the early state of generation, particles
layers MDPT, modified uniaxial compressive strength of weathered tend to make a large movement to explore the search space and
rocks MUCS, groundwater table W and soil type at the cutterhead face move far away from the current gBest. Therefore, ω and c1 values
St) and one anomalous condition (shield stoppage Sp). This study still are large, and c2 value is small, as shown in Fig. 3(a). This op-
adopts these twelve parameters for developing ELM-based ground re- eration is termed as exploration, and the update of each particle
sponse prediction model. Herein, five shield operational parameters position and velocity complies with Eqs. (10) and (11).
and Sp can be collected in real time during tunneling process. The re- (ii) Exploitation: at the later stage of generation, particles tend to make
maining geological and geometric parameters can be obtained during a small movement to slowly converge at gBest and avoid heavy
site investigation and route design process, which are conducted before vibration. Therefore, ω and c1 values are small, and c2 value is
the construction of tunnel. Therefore, tunnel-induced settlement can be large, as shown in Fig. 3(b). This operation is termed as exploita-
predicted in real time. tion, and the update of each particle position and velocity complies
with Eqs. Eqs. (10) and (11).
3.1.2. Model architecture (iii) Jump: the former two actions achieve the adaptive adjustment of
The framework of the ELM-based ground response prediction model parameters in PSO, but the algorithm is still likely to be trapped in
is presented in Fig. 2. Input layers have 12 neurons corresponding 12 the local optima and cannot jump out this status. Therefore, a jump

Fig. 2. Architecture of ELM-based ground response prediction model.

4
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

Fig. 4. Absorbing wall boundary condition.

Fig. 3. Search methods of particles: (a) exploration; (b) exploitation.

action is assigned to the action space, which can be obtained by:

Xik + 1 = pBestik + r (Xmax Xmin ) (16)

where r = random number within the range [−1, 1] complying with


uniform distribution; Xmax and Xmin = upper and lower bound of par-
ticles location. This new location update method allows particles to
jump out the local optima.
In this study, ε-greedy strategy is employed to select an action. The
action that can generate maximum reward according to the results of Q-
DNN is selected with probability (1–ε), but the action is randomly se-
lected from all available actions for exploring unknown conditions with
probability ε.

3.2.2. Boundary conditions


There are four boundary conditions in PSO, i.e., reflecting wall, Fig. 5. Framework of proposed DQN-based PSO optimizer.
damping wall, invisible wall, and absorbing wall. Absorbing wall is
used to limit the position and velocity of particles in this study. As
shown in Fig. 4, there are upper and lower bound for the position and also the current state of RL. The rewards of each action under this state
velocity vectors. When they exceed this boundary condition, the posi- will be estimated by the Q-DNN, thereafter the action which can create
tion and velocity of each particle are reset as the values of upper or maximum reward under this state will be selected. The velocity and
lower bounds, respectively (see Eqs. (17) and (18)). Otherwise, the position will be updated based on the selected action, thereby the new
update of position and velocity vector complies with Eqs. (10), (11) and state will be generated. After the pBest and gBest are updated, the
(16). search process completes if the gBest satisfies the termination condition.
Otherwise, the whole process will repeat.
Xmax , if Xik + 1 > Xmax
Xik + 1 =
Xmin , if Xik + 1 < Xmax (17) 3.3. Enhanced PSO optimizer

Vmax , if Vik + 1 > Vmax To validate the superiority of the proposed RL-based optimizer
Vik + 1 =
Vmin, if Vik + 1 < Vmax (18) DQN-PSO, an enhanced optimizer is also developed for comparison.
This enhanced PSO has two characteristics:

3.2.3. Basic framework (i) Adaptive accelerator parameters: as mentioned above, particles start
Fig. 5 presents the basic framework of the proposed optimizer DQN- from exploring in the search space and thereafter transfer to ex-
PSO. This optimizer starts from creating several populations, which is ploitation operation with the increase of generations. Therefore, a

5
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

search strategy that c1 decreases linearly and c2 increases linearly which all particles move towards the positions with low values of SSE.
with the increase of generations has been developed [54]. This The reward rule is that the final reward r is 1 if the SSE yielded by gBest
strategy can improve the global search capability of PSO at the early is less than the prescribed goal value, otherwise, the current model
stage and local optimization capability at the later stage, as shown acquires reward of 0. Note that the exact rewards are only known at the
following: end of each episode.
k
c1k = c1_initial + (c1_final c1_initial ) 3.4.2. Model framework
t (19)
The pseudocode of the proposed hybrid ELM-based prediction
c2k = c2_initial +
k
(c2_final c2_initial ) model and DQN-PSO optimizer is presented in Algorithm 1. It can be
t (20) observed that the hybrid algorithm involves prescribed number of
where k = current generation; t = a total of generation; c1k , c2k
= values episodes. In each episode, states are updated continuously until the SSE
of c1 and c2 at the kth generation, respectively; c1_initial, c2_initial = initial value yielded by the gBest can satisfies the termination condition.
values of c1 and c2, respectively; c1_final, c2_final = final values of c1 and Algorithm 1: (Hybrid DQN and ELM algorithm).
c2, respectively. The update of each particle position and velocity
complies with Eqs. (10) and (11).
1. step = 0
(ii) Jump: unlike the DQN-PSO optimizer, enhanced PSO optimizer 2. for episode in range (number of prescribed episodes)
3.Randomly initialize state s
cannot intelligently select the action of particles based on the re- 4.while True:
ward of each action. Therefore, when the number of generations 5.Estimate rewards of each action and choose an action a
exceeds a critical value (Eq. (21)) and the difference of the objec- 7.Modify the current state s to s_ by taking a and the corresponding reward
tive function outputs generated at the adjacent time steps is less 8.Store [s, a, reward, s_] in the replay memory pool
9.if step satisfies the update condition
than a threshold value (Eq. (22)), a jump operation is activated.
10.Update target DNN
Thereafter the update of each particle position in the jump opera- 11.if gBest generated by state s_ satisfies the termination conditions:
tion complies with Eq. (16). 12.Evaluate the reward of this episode
13.break
k > NJ (21) 14.step = step + 1
k+1 k 15. Exit
fobj fobj fJ (22)
where k = current generation; fobj
k+1
, fobj
k
= output of objective function
at the kth and (k + 1)th generations, respectively; NJ, fJ = thresholds
for the number of generations and the difference of adjacent outputs of 4. Application of proposed model
the objective function, respectively.
4.1. Overview of case study
3.4. Proposed hybrid deep RL model
In order to investigate the feasibility and reliability of the proposed
3.4.1. Reward rule and actions selection hybrid DQN-PSO optimizer and ELM-based tunneling-induced ground
Note that the reward rule is not demonstrated in the former section, responses prediction model, an in-situ experiment conducted by Zhang
which needs to be determined by the hybrid model. As mentioned in the et al. [48] on a practical tunneling project from Changsha city, China is
description of ELM, the training process of ELM is merely completed by used in this study. This experimental zone consisted of five tunnel
computing the solution of a linear system. After the hyper-parameters sections with six metro stations. A total of 5.44 km was constructed
(the number of hidden neurons) of ELM are determined, the model using earth pressure balanced (EPB) shield machine (construction
performance depends heavily on the weights and biases. To improve the starting in 2016 and completing in 2019). The tunnel was primarily
performance of ELM-based ground response prediction model, the excavated in the weathered rocks, which means that the consolidation
weights and biases of ELM are optimized by the DQN-based optimizer. settlement completed rapidly after the tunnel was constructed. There-
In this hybrid algorithm, the state of the DQN-based optimizer re- fore, this case study focused on the tunneling-induced ultimately steady
presents the weights and biases of ELM, as shown in followings: ground settlement. Each monitoring cross-section of settlement was
positioned at a fixed interval of around 10 m.
X = [X1 , X2 , , Xi 1 , Xi , , Xn ]n × m (23) With regard to the collection of datasets, the geological conditions
Xi = [x1, x2, , x i 1, x i, , xm]1 × m (24) and geometric factor at each ring were obtained by the site investiga-
tion before tunneling process. The five operational parameters were
where X = an aggregate of all populations; Xi = a single population. i. recorded per minute by the shield machine data acquisition system, and
e, an aggregate of weights and biases of ELM; n = the size of popula- the average operational parameters at each ring were preprocessed. The
tion; m = the number of particles in each population, i.e. the number of ground settlement monitoring points were installed at an interval of
weights and biases in ELM (20 × 12 + 20 = 260, as mentioned in 10 m and was measured twice a day. The settlement of monitoring
Section 3.1.2); xi = a single particle of a population. Therefore, the points and the 12 input variables at the corresponding positions were
number of particles in each state is 260n. stored in the database for training ELM-based ground response pre-
The objective function of the DQN-PSO optimizer is determined by diction model, thereby synchronousness between the settlement data
the sum of squared errors (SSE), which is used to evaluate the reward and the input variables can be guaranteed. The database used in this
value. study can be downloaded in the Appendix A section.
n
SSE = [ i fE ( i x i + bi) yi ]2 4.2. Results
i=1 (25)
where yi = actual settlement; i fE ( i x i + bi ) = predicted settlement Table 1 presents the values of parameters used in all algorithms in
using the ELM-based model, in which parameters βi, ωi and bi derive this study. The experimental results indicate the model performance is
from β, w and b (see Section 3.1.2), respectively; xi = one set of input not particularly sensitive to the architecture of target DNN and Q-DNN.
variables. The updates of pBest and gBest are related to the SSE value, in The number of hidden layers in the target DNN and Q-DNN is 1, and the

6
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

Table 1 intelligent operation mechanism of the agent, while other ML-based


Values of parameters in three algorithms. models merely run as a black box. To investigate the operation me-
Algorithm Parameter Value chanism of the DQN-PSO optimizer, the actions at four phases are
presented, as shown in Fig. 7. At the phase I, it can be observed that the
ELM Number of hidden neurons 20 agent focuses on the exploration at the initial stage, implying this action
PSO ω (exploration | exploitation) 0.9 | 0.4 can receive the largest reward based on the Q-DNN results. The per-
c1 (exploration | exploitation) 2.5 | 0.4 formance of the hybrid deep RL prediction model at the initial stage is
c2 (exploration | exploitation) 0.4 | 2.5 not steady, thereby the optimization of weighs and biases can easily
[Vmin, Vmax] [-3, 3]
[Xmin, Xmax] [-10, 10]
improve the prediction accuracy, which complies with the obvious
Population size 20 decrease in the SSE value (see Fig. 6). At the earth stage of phase II, the
Maximum generation 3000 agent still starts from exploration, but this action cannot reduce the SSE
Enhanced PSO c1 (initial | final) 2.5 | 0.5 value, thereby the agent transfers to conduct the exploitation action,
c2 (initial | final) 0.5 | 2.5 and sometimes conduct the jump action for jumping out local optima.
NJ 2000 Consequently, the exploitation and jump actions alternately appear and
fJ 0.01
dominate this phase. At the phase III, the agent focuses on the ex-
DQN Number of hidden neurons 15 ploitation, because the action trials at the phase II cannot cause large
RL learn (criteria| step) 200| 5 decrease in the SSE value (see Fig. 6). It indicates the performance of
Target DNN update interval 300
Batch size 32
the hybrid deep RL prediction model is roughly steady, and SSE will
Episode 100 converge at a fixed value. Similar condition can also be observed at the
goal_SSE 5 phase IV, where exploitation still dominate this phase. SSE value varies
Reward decay coefficient γ 0.9 within an acceptable range and end up with the prescribed goal value,
0.1
ε-greedy
thereby it is reasonable to deduce the optimum hybrid deep RL pre-
Learning rate α 0.01
diction model for predicting tunneling-induced ground response is ob-
tained. The consistency of agent’s action and the corresponding model
corresponding number of neurons is 15. Q-DNN starts to training when performance at each phase ensures the reasonability of the hybrid deep
the agent’s experiences in the replay memory pool D reaches 200 and it RL prediction model. The agent like a human intelligently guides par-
is trained with an interval of 5 time steps. The size of mini-batch used ticles to choose the optimum action at each generation and move to-
for training Q-DNN is 32, and the parameters of the target DNN are wards the best position.
updated with an interval of 300 time steps. A total of 100 game epi- To clearly reveal the evolution of the prediction performance of
sodes are carried out by the intelligent agent. The computational results model, the predicted maximum settlement for the test set using the
indicate the ELM-based prediction model showcases great performance hybrid deep RL prediction model generated at three typical steps a, b
with the SSE value of 5, thereafter the decrease in the goal value will and e are presented, as shown in Fig. 8. It can be seen that the predicted
lead to a dramatic increase in the computational cost and may fail to settlement using the model generated at the step a severely deviates
reach the goal value. Therefore, the goal value of SSE is ultimately from the measured settlement. It cannot accurately capture the evolu-
defined as 5 in this study. tion of tunneling-induced settlement and loses fidelity at some mon-
Fig. 6 presents the evolution of SSE value generated by the hybrid itoring points, e.g. the largest settlement of 48 mm is not detected. The
deep RL prediction model in a typical episode. It can be observed that performance of model generated at the step b improves dramatically.
this episode consumes 8802 steps to reach the goal SSE value. The The predicted evolution of settlement shows great agreement with the
whole evolution of SSE can be categorized into four phases according to measured settlement. Meanwhile all of large settlement that exceeds
the characteristics of SSE variation. At the phase I from a to b, SSE value 10 mm can be detected by this model, which is of great significance for
experiences a remarkable decrease from 8.395 at the 1st step to 5.288 avoiding risks in engineering practice. The performance of model
at the 1045th step. Thereafter, the change in the SSE value is not dis- generated at the step e is further refined with the lower SSE value,
cernable, but three steady phases can be obviously observed. The first compared with the model generated at the step b. In detail, the dif-
steady phase (phase II: b – c) continues a total of 3373 steps, followed ference of predicted and measured settlements at some monitoring
by phase III (c – d) with 2919 steps and phase IV (d – e) with 665 steps. points further reduces and shows better consistency with the measured
The advantage of the RL algorithm DQN is that it can reveal the evolution of ground maximum settlement.

5. Discussion
a: (1, 8.395)
8 5.1. Compared with basic and enhanced PSO

To validate the superiority of the proposed RL-based optimizer


DQN-PSO, a comparison among three optimizers, that is, basic PSO,
7 enhanced PSO and DQN-PSO, is conducted. Fig. 9 presents the results of
SSE (mm)

phase I: a-b ELM-based ground responses prediction model optimized by three op-
timizers. The evolution of SSE value within 3000 generations is pre-
6 sented because three optimizers roughly converged at a fixed value. It
b: (1045, 5.288) c: (4418, 5.22) can be observed that DQN-PSO obviously outperforms the basic and
d: (7337, 5.1)
phase II: b-c phase III: c-d enhanced PSO with the lowest value of SSE and fastest convergence.
e: (8802, 4.997) The corresponding maximum generation of three types of optimizers
5
goal_SSE phase IV: : d-e when SSE values virtually converge at a constant value is presented in
Table 2. In detail, the SSE value optimized by the DQN-PSO starts to be
0 2000 4000 6000 8000 10000 less than the basic and enhanced PSO when the number of generations
Step exceeds 10, because the DQN-PSO optimizer always guides particles
selecting a correct action. Meanwhile the whole optimization process
Fig. 6. Evolution of SSE value in a typical episode. virtually completes at around the 1000th generation with SSE value of

7
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

jump phase I
2

Action
exploitation
1

exploration
0

0 400 800 1200


Step

jump phase II
2
Action

exploitation
1

exploration
0

1000 2000 3000 4000


Step

jump phase III


2
Action

exploitation
1

exploration
0

4000 5000 6000 7000


Step

jump phase IV
2
Action

exploitation
1

exploration
0

7500 8000 8500 9000


Step
Fig. 7. Actions at four phases.

5.288, thereafter the objective of search operation is merely for predicted by the ELM-based prediction model optimized by three op-
achieving the prescribed goal value of SSE and the computational cost is timizers as well as the MAE values computed using Eq. (26). It can be
expensive. It can be seen from Fig. 9 that the computational cost for seen that the hybrid deep RL model outperforms ELM-based prediction
decreasing SSE value from 5.288 to the prescribed goal value is ap- models optimized by PSO and enhanced PSO. Enhanced PSO slightly
propriately seven times the figure for decreasing SSE value from the refines the predicted settlement evolution with a slight decrease in the
initial value to 5.288. It is noteworthy that the enhanced PSO also MAE value (from 2.64 to 2.51). The improvement in the prediction
outperform the basic PSO with lower value of SSE from the 326th performance of the hybrid deep RL model is remarkable, in which the
generation. Enhanced PSO indeed further optimizes the search trajec- MAE value decreases to 1.97. The great agreement between the pre-
tory of particles to a certain extent, but the key challenges including dicted and measured evolution of settlement and the improvement in
which action should be chose and when should take this action are still recognizing maximum settlement is observed. Meanwhile all datasets
dodged. It means that the enhanced PSO cannot avoid being trapped in are closer to the line with the slope of 1. Hence the tunneling-induced
the local optima, thereby the decrease in the convergence SSE value is ground responses prediction model can be established using the hybrid
not discernable, compared with the basic PSO. The basic PSO conducts deep RL algorithm.
the exploration action throughout the whole optimization process, n
thereby it is easy to be trapped in the local optima. The premature 1
MAE = |ri pi |
convergence problem is obvious, because the SSE value roughly n i=1 (26)
maintains constant when the number of generations reaches 500.
where r = measured settlement; p = the predicted settlement; n = a
Fig. 10 presents the evolution of ground responses for the test set
total number of datasets.

8
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

Settlement (mm)
0

-20

-40
a
Settlement (mm)

-20

-40
b
Settlement (mm)

-20
Measured
-40 Predicted
e
0 10 20 30 40 50 60
Datasets
Fig. 8. Predicted settlement for the test set using the hybrid deep RL model generated at three steps.

10 in this study, and the detailed formulations are not presented for
9.0
basic PSO brevity, which can refer to Zhang [60]. The results of GSA are shown in
enhanced PSO Fig. 11, compared with the correlation coefficients which are calculated
9 8.5 7.6 DQN-PSO by absolute Pearson coefficients (see Eq. (27)). It can be observed that
the parameters that have strong correlations with settlement (Sp, St, C)
8.0 7.4 still have higher impact on the ELM based model. Th with the highest
SSE (mm)

8 10 11 12 13 14 15
Pearson value among five operational parameters is also the most im-
portant operational parameter in the ELM based model. Pr with the
7.2
7
300 310 320 330 340 350 lowest Pearson value is also the most insignificant parameter in the ELM
based model. The rank of other parameters merely has a slight varia-
tion. Such factors indicate the ELM based model optimized by DQN-PSO
6 obviously captures the potential correlations between the input and
output parameters. The generalization ability and the practicability of
such model can thus be guaranteed.
5
0 1000 2000 3000 n n n
n x i yi xi yi
Generation i=1 i=1 i=1
R=
2 2
Fig. 9. Comparison of DQN-PSO optimizer with basic and enhanced PSO op- n n n n

timizers. n xi2 xi n yi2 yi


i=1 i=1 i=1 i=1 (27)

Table 2
Comparison among three optimizers.
6. Conclusions
Optimizer Generation SSE
The contribution of this study is that a hybrid deep reinforcement
Basic PSO 1497 6.860
Enhanced PSO 1280 6.540 learning (RL) model which integrates extreme learning machine (ELM)
DQN-PSO 1045 5.288 and deep RL algorithm deep-Q network (DQN) is proposed for pre-
dicting tunneling-induced ground responses in real time, in which the
relationships among influential factors and ground response were ex-
5.2. Sensitivity analysis plored through self-practicing. Another contribution is that the pro-
posed optimizer DQN-PSO knows which action should be conducted
To evaluate the performance of the proposed ELM based settlement and when should take this action, thereby ensures the global optima to
prediction model optimized by DQN-PSO. Global sensitivity analysis be obtained. Unlike previous metaheuristic optimization algorithms
(GSA) is conducted to reveal how model output uncertainty can be that guide the movement of particles in a rough manner, the reward
apportioned to the uncertainty in each input variable [55]. Variance- rule of the DQN-based optimizer focuses on evaluating the reward of
based GSA method has been extensively used in main domains [56–58], agent’s action, hence particles like an intelligent human always select
thereby it is used in this study. In this method, the total order index STi the optimum action at each step. To authors’ best knowledge, this is the
in the variance-based GSA method measures the effect of an input first work on using hybrid RL algorithm DQN and ML algorithm ELM to
parameter and its coupled effect with other input parameters on the investigate tunneling-induced ground responses. The following con-
model output. The calculation of STi proposed by Jansen [59] is adopted clusions can be drawn, based on the results of this work:

9
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

Predicted settlement (mm)


0
0

Settlement (mm)
-20
-20

-40
MAE = 2.64
-40 -40 -20 0
basic PSO
Measured settlement (mm)

Predicted settlement (mm)


0 0
Settlement (mm)

-20
-20

-40
MAE = 2.51
-40 -40 -20 0
enhanced PSO
Measured settlement (mm)

Predicted settlement (mm)


0 0
Settlement (mm)

-20
-20

Measured -40
Predicted MAE = 1.97
-40 -40 -20 0
DQN-PSO
Measured settlement (mm)
0 10 20 30 40 50 60
Datasets
Fig. 10. Predicted settlement for the test set using ELM-based prediction models optimized by three optimizers.

Sensitivity index Absolute Pearson coefficient response in real time, overcoming the deficiency of empirical,
-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 analytical and numerical models established by domain experts.
Sp Sp The ultimate ELM based model can be expressed with an explicit
St St formulation, which is user-friendly in engineering practice.
Meanwhile, the performance of prediction model can be improved
C C
with the increase in the datasets collected from the field construc-
MUCS W tion.
W MUCS (3) The hybrid deep RL model is genetic, which means that it can be
Th Th used to various situations with different states, actions, rules, re-
MDPT Cp wards and objective function defined by domain experts without
any debugging. Meanwhile the basic meta-heuristic and machine
MSPT MDPT
learning algorithms used in the hybrid deep RL model can be ran-
To MSPT domly replaced based on different situations. Such model offers a
Cp Gf pragmatic and reliable framework to develop a data-driven or
Gf To physical model.
Pr Pr
Declaration of Competing Interest
Fig. 11. Comparison between sensitivity indices and correlation coefficients of
input parameters. The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influ-
(1) Because DQN-PSO optimizer is able to guide particles to implement ence the work reported in this paper.
optimum action at each step, the global optima can be acquired
when the value of objective function converges at a fixed value. In Acknowledgement
other words, the DQN-PSO optimizer can search the global best
weights and biases of ELM with higher accuracy and lower com- This study is sponsored by the National Natural Science Foundation
putational cost, compared with basic or enhanced metaheuristic of China (No. 51938005) and the program of High-level Talent of
optimization algorithms. Innovative Research Team of Hunan Province 2019 (No. 2019RS1030).
(2) The hybrid deep RL model with the integration of ELM and DQN- The authors greatly appreciate these financial supports during this re-
PSO optimizer can accurately predict tunneling-induced ground search.

10
P. Zhang, et al. Advanced Engineering Informatics 45 (2020) 101097

Appendix A neural networks and tree search, Nature 529 (2016) 484–489.
[29] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare,
A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie,
The database used in this study can be download at following link: A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis,
https://www.researchgate.net/publication/336208927_Database_ Human-level control through deep reinforcement learning, Nature 518 (2015)
529–533.
for_maximum_settlement_collected_from_Changsha_Metro_Line_4_ [30] T. Noam Brown, Sandholm, Superhuman AI for multiplayer poker, Science (2019)
Liugoulong_to_Fubuhe_station. 1–12.
[31] S. Dargan, M. Kumar, M.R. Ayyagari, G. Kumar, A survey of deep learning and its
applications: a new paradigm to machine learning, Arch. Comput. Methods Eng.
References (2019).
[32] J. Shi, J.A.R. Ortigao, J. Bai, Modular neural networks for predicting settlements
[1] P. Zhang, H.-N. Wu, R.-P. Chen, T.H.T. Chan, Hybrid meta-heuristic and machine during tunneling, J. Geotech. Geoenviron. Eng. 124 (1998) 389–395.
learning algorithms for tunneling-induced settlement prediction: A comparative [33] C.Y. Kim, G.J. Bae, S.W. Hong, C.H. Park, Neural network based prediction of
study, Tunnell. Undergr. Space Technol. 99 (2020) 103383. ground surface settlements due to tunnelling, Comput. Geotech. 28 (2001)
[2] C. Sagaseta, Analysis of undrained soil deformation due to ground loss, 517–547.
Géotechnique 37 (1987) 301–320. [34] S. Suwansawat, H.H. Einstein, Artificial neural networks for predicting the max-
[3] A. Verruijt, J.R. Booker, Surface settlements due to deformation of a tunnel in an imum surface settlement caused by EPB shield tunneling, Tunnell. Undergr. Space
elastic half plane, Géotechnique 48 (1996) 709–713. Technol. 21 (2006) 133–150.
[4] H.S. Yu, R.K. Rowe, Plasticity solutions for soil behaviour around contracting [35] O.J. Santos, T.B. Celestino, Artificial neural networks analysis of São Paulo subway
cavities and tunnels, Int. J. Numer. Anal. Met. 23 (1999) 1245–1279. tunnel settlement data, Tunnell. Undergr. Space Technol. 23 (2008) 481–491.
[5] F. Pinto, A.J. Whittle, Ground movements due to shallow tunnels in soft ground. I: [36] A. Marto, M. Hajihassani, R. Kalatehjari, E. Namazi, H. Sohaei, Simulation of
analytical solutions, J. Geotech. Geoenviron. 140 (2014) 04013040. longitudinal surface settlement due to tunnelling using artificial neural network,
[6] W. Broere, D. Festa, Correlation between the kinematics of a Tunnel Boring Int. Rev. Modelling Simul. 5 (2012) 1024–1031.
Machine and the observed soil displacements, Tunnell. Undergr. Space Technol. 70 [37] A. Darabi, K. Ahangari, A. Noorzad, A. Arab, Subsidence estimation utilizing var-
(2017) 125–147. ious approaches – A case study: Tehran No. 3 subway line, Tunnell. Undergr. Space
[7] S. Suwansawat, H.H. Einstein, Describing settlement troughs over twin tunnels Technol. 31 (2012) 117–127.
using a superposition technique, J. Geotech. Geoenviron. Eng. 133 (2007) 445–468. [38] R. Boubou, F. Emeriault, R. Kastner, Artificial neural network application for the
[8] P. Zhang, Z.Y. Yin, R.P. Chen, Analytical and semi-analytical solutions for de- prediction of ground surface movements induced by shield tunnelling, Can.
scribing tunneling-induced transverse and longitudinal settlement troughs, Int. J. Geotech. J. 47 (2010) 1214–1233.
Geomech. (2020), https://doi.org/10.1061/(ASCE)GM.1943-5622.0001748. [39] M. Hasanipanah, M. Noorian-Bidgoli, D. Jahed Armaghani, H. Khamesi, Feasibility
[9] R.B. Peck, Deep excavations and tunneling in soft ground, in: Proceedings of 7th of PSO-ANN model for predicting surface settlement caused by tunneling, Eng.
International Conference on Soil Mechanic and Foundation Engineering Mexico Comput-Germany 32 (2016) 705–715.
City, 1969, pp. 225–290. [40] R.P. Chen, P. Zhang, X. Kang, Z.Q. Zhong, Y. Liu, H.N. Wu, Prediction of maximum
[10] P. Zhang, R.-P. Chen, H.-N. Wu, Y. Liu, Ground settlement induced by tunneling surface settlement caused by EPB shield tunneling with ANN methods, Soils Found.
crossing interface of water-bearing mixed ground: A lesson from Changsha, China, 59 (2019) 284–295.
Tunnell. Undergr. Space Technol. 96 (2020) 103224. [41] R.P. Chen, P. Zhang, H.N. Wu, Z.T. Wang, Z.Q. Zhong, Prediction of shield tun-
[11] X.-T. Lin, R.-P. Chen, H.-N. Wu, H.-Z. Cheng, Deformation behaviors of existing neling-induced ground settlement using machine learning techniques, Front. Struct.
tunnels caused by shield tunneling undercrossing with oblique angle, Tunnell. Civ. Eng. 13 (2019) 1363–1378.
Undergr. Space Technol. 89 (2019) 78–90. [42] K. Ahangari, S.R. Moeinossadat, D. Behnia, Estimation of tunnelling-induced set-
[12] J. Yang, Z.Y. Yin, X.F. Liu, F.P. Gao, Numerical analysis for the role of soil prop- tlement by modern intelligent methods, Soils Found. 55 (2015) 737–748.
erties to the load transfer in clayfoundation due to the traffic load of the metro [43] D. Bouayad, F. Emeriault, Modeling the relationship between ground surface set-
tunnel, Transp. Geotech. 23 (2020) 100336. tlements induced by shield tunneling and the operational and geological parameters
[13] J. Ninić, S. Freitag, G. Meschke, A hybrid finite element and surrogate modelling based on the hybrid PCA/ANFIS method, Tunnell. Undergr. Space Technol. 68
approach for simulation and monitoring supported TBM steering, Tunnell. Undergr. (2017) 142–152.
Space Technol. 63 (2017) 12–28. [44] F. Wang, B. Gou, Y. Qin, Modeling tunneling-induced ground surface settlement
[14] A. Pourtaghi, M.A. Lotfollahi-Yaghin, Wavenet ability assessment in comparison to development using a wavelet smooth relevance vector machine, Comput. Geotech.
ANN for predicting the maximum surface settlement caused by tunneling, Tunnell. 54 (2013) 125–132.
Undergr. Space Technol. 28 (2012) 257–271. [45] L.M. Zhang, X.G. Wu, W.Y. Ji, S.M. AbouRizk, Intelligent approach to estimation of
[15] P. Zhang, Z.-Y. Yin, Y.-F. Jin, T.H.T. Chan, A novel hybrid surrogate intelligent tunnel-induced ground settlement using Wavelet Packet and Support Vector
model for creep index prediction based on particle swarm optimization and random Machines, J. Comput. Civil. Eng. 31 (2017) 04016053.
forest, Eng. Geol. 265 (2020) 105328. [46] J. Zhou, X. Shi, K. Du, X.Y. Qiu, X.B. Li, H.S. Mitri, Feasibility of Random-Forest
[16] P. Zhang, Z.Y. Yin, Y.F. Jin, G.L. Ye, An AI-based model for describing cyclic approach for prediction of ground settlements induced by the construction of a
characteristics of granular materials, Int. J. Numer. Anal. Met. (2020) 1–21. shield-driven tunnel, Int. J. Geomech. 17 (2016) 04016129.
[17] P. Zhang, Z.Y. Yin, Y.F. Jin, T. Chan, Intelligent Modelling of Clay Compressibility [47] V.R. Kohestani, M.R. Bazargan-Lari, J. Asgari-marnani, Prediction of maximum
using Hybrid Meta-Heuristic and Machine Learning Algorithms, Geosci. Front. surface settlement caused by earth pressure balance shield tunneling using random
(2020) in press. forest, J. AI Data Mining 5 (2017) 127–135.
[18] D.J. Armaghani, E.T. Mohamad, M.S. Narayanasamy, N. Narita, S. Yagiz, [48] P. Zhang, R.P. Chen, H.N. Wu, Real-time analysis and regulation of EPB shield
Development of hybrid intelligent models for predicting TBM penetration rate in steering using Random Forest, Automat. Constr. 106 (2019) 102860.
hard rock condition, Tunnell. Undergr. Space Technol. 63 (2017) 29–43. [49] C.J.C.H. Watkins, P. Dayan, Q-Learning, Mach. Learn. 8 (1992) 279–292.
[19] Z.Y. Yin, Y.F. Jin, S.L.J. Shen, P.Y. Hicher, Shen, Int. J. Numer. Anal. Met. 42 (2017) [50] Y. Yuan, Z.L. Yu, Z. Gu, Y. Yeboah, W. Wei, X. Deng, J. Li, Y. Li, A novel multi-step
1–25. Q-learning method to improve data efficiency for deep reinforcement learning,
[20] K.E. Parsopoulos, Parallel cooperative micro-particle swarm optimization: A mas- Knowl-Based Syst. 175 (2019) 107–117.
ter–slave model, Appl. Soft Comput. 12 (2012) 3552–3579. [51] J. Kennedy, R. Eberhart, Particle swarm optimization, in: IEEE International
[21] W.H. Lim, N.A. Mat Isa, Two-layer particle swarm optimization with intelligent Conference on Neural NetworksPerth, Australia, 1995, pp. 1942–1948.
division of labor, Eng. Appl. Artif. Intel. 26 (2013) 2327–2348. [52] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: Theory and applica-
[22] A. Gálvez, A. Iglesias, A new iterative mutually coupled hybrid GA–PSO approach tions, Neurocomputing 70 (2006) 489–501.
for curve fitting in manufacturing, Appl. Soft Comput. 13 (2013) 1491–1504. [53] C.R. Rao, S.K. Mitra, Generalized inverse of matrices and its applications, Wiley,
[23] S.Z. Zhao, P.N. Suganthan, Q.-K. Pan, M. Fatih Tasgetiren, Dynamic multi-swarm 1971.
particle swarm optimizer with harmony search, Expert Syst. Appl. 38 (2011) [54] L. Yang, H. Su, Z. Wen, Improved PLS and PSO methods-based back analysis for
3735–3742. elastic modulus of dam, Adv. Eng. Softw. 131 (2019) 205–216.
[24] M.A. Lopes Silva, S.R. de Souza, M.J. Freitas Souza, A.L.C. Bazzan, A reinforcement [55] A. Saltelli, I.M. Sobol, About the use of rank transformation in sensitivity analysis of
learning-based multi-agent framework applied for solving routing and scheduling model output, Reliab. Eng. Syst. Safe. 50 (1995) 225–239.
problems, Expert Syst. Appl. 131 (2019) 148–171. [56] C.Y. Zhao, A.A. Lavasan, R. Hölter, T. Schanz, Mechanized tunneling induced
[25] K. Zhang, H. Zhang, Y. Mu, S. Sun, Tracking control optimization scheme for a class building settlements and design of optimal monitoring strategies based on sensi-
of partially unknown fuzzy systems by using integral reinforcement learning ar- tivity field, Comput. Geotech. 97 (2018) 246–260.
chitecture, Appl. Math. Comput. 359 (2019) 344–356. [57] L.M. Zhang, X.G. Wu, H.P. Zhu, S.M. AbouRizk, Performing global uncertainty and
[26] Y. Ding, L. Ma, J. Ma, M. Suo, L. Tao, Y. Cheng, C. Lu, Intelligent fault diagnosis for sensitivity analysis from given data in tunnel construction, J. Comput. Civil. Eng. 31
rotating machinery using deep Q-network based health state classification: A deep (2017) 04017065.
reinforcement learning approach, Adv. Eng. Inform. 42 (2019). [58] K.M. Hamdia, H. Ghasemi, X.Y. Zhuang, N. Alajlan, T. Rabczuk, Sensitivity and
[27] F. Hourfar, H.J. Bidgoly, B. Moshiri, K. Salahshoor, A. Elkamel, A reinforcement uncertainty analysis for flexoelectric nanostructures, Comput. Method Appl. M. 337
learning approach for waterflooding optimization in petroleum reservoirs, Eng. (2018) 95–109.
Appl. Artif. Intel. 77 (2019) 98–116. [59] M.J.W. Jansen, Analysis of variance designs for model output, Comput. Phys.
[28] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. van den Driessche, Commun. 117 (1999) 35–43.
J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, [60] P. Zhang, A novel feature selection method based on global sensitivity analysis with
D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, application in machine learning-based prediction model, Appl. Soft Comput. 85
K. Kavukcuoglu, T. Graepel, D. Hassabis, Mastering the game of Go with deep (2019) 105859.

11

You might also like