0% found this document useful (0 votes)
107 views11 pages

Prediction of Shear Strength of Soft Soil Using Machine Learning Methods

The document discusses using machine learning methods like PANFIS, GANFIS, SVR, and ANN to predict the shear strength of soft soils based on case studies of 188 soil samples. It compares the performance of these methods and finds that PANFIS has the highest prediction capability based on metrics like RMSE and R. The study concludes that PANFIS is a promising technique for predicting the strength of soft soils.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views11 pages

Prediction of Shear Strength of Soft Soil Using Machine Learning Methods

The document discusses using machine learning methods like PANFIS, GANFIS, SVR, and ANN to predict the shear strength of soft soils based on case studies of 188 soil samples. It compares the performance of these methods and finds that PANFIS has the highest prediction capability based on metrics like RMSE and R. The study concludes that PANFIS is a promising technique for predicting the strength of soft soils.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Catena 166 (2018) 181–191

Contents lists available at ScienceDirect

Catena
journal homepage: www.elsevier.com/locate/catena

Prediction of shear strength of soft soil using machine learning methods T


a b b c d,e,⁎
Binh Thai Pham , Le Hoang Son , Tuan-Anh Hoang , Duc-Manh Nguyen , Dieu Tien Bui
a
Geotechnical Engineering and Artificial Intelligence research group (GEOAI), University of Transport Technology, Ha Noi, Viet Nam
b
VNU University of Science, Vietnam National University, Viet Nam
c
Department of Geotechnical Engineering, University of Transport and Communication, Ha Noi, Viet Nam
d
Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam
e
Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam

A R T I C LE I N FO A B S T R A C T

Keywords: Shear strength of the soil is an important engineering parameter used in the design and audit of geo-technical
Artificial neural networks structures. In this research, we aim to investigate and compare the performance of four machine learning
Machine learning methods, Particle Swarm Optimization - Adaptive Network based Fuzzy Inference System (PANFIS), Genetic
Particle swarm optimization Algorithm - Adaptive Network based Fuzzy Inference System (GANFIS), Support Vector Regression (SVR), and
Strength of soft soils
Artificial Neural Networks (ANN), for predicting the strength of soft soils. For this purpose, case studies of 188
plastic clay soil samples collected from two major projects, Nhat Tan and Cua Dai bridges in Viet Nam have been
used for generating training and testing datasets for constructing and validating the models. Validation and
comparison of the models have been carried out using RMSE, and R. The results show that the PANFIS has the
highest prediction capability (RMSE = 0.038 and R = 0.601), followed by the GANFIS (RMSE = 0.04 and
R = 0.569), SVR (RMSE = 0.044 and R = 0.549), and ANN (RMSE = 0.059 and R = 0.49). It can be concluded
that out of four models the PANFIS indicates as a promising technique for prediction of the strength of soft soils.

1. Introduction cone penetration test (CPT) data. Azari et al. (2014) studied the effects
of shear strength variation in the disturbed zone on the time-dependent
In geotechnical engineering, the shear strength of the soil is an behavior of soft soil deposits improved with vertical drains and pre-
important engineering parameter which is certainly used in the design loading. Griffiths et al. (2016) used equivalent linear and nonlinear 1D
and audit of many geo-environmental and geo-technical structures i.e. site response analyses for the well-known Treasure Island site to de-
road foundations and pavements, earth dams, and retaining walls monstrate challenges associated with accurately modeling large shear
(Vanapalli and Fredlund, 2000). It is determined by two important strains, and subsequent surface response, at soft soil sites. Oliveira et al.
parameters to determine the shear strength, internal friction angle and (2017) investigated constitutive models to simulate the creep behavior
unit cohesion (Das and Sobhan, 2013), and affected by several factors of a soft soil in its natural state or chemically stabilized state. It has
namely plastic index (PI), liquid limit (LL), moisture content (W), clay been inferred from those studies that a well-established mathematical
content (CC), etc. (Das and Sobhan, 2013; Kaya, 2009). It increases model should be constructed in order to achieve high accuracy of
together with the approximate volume of grouted zone for treated prediction.
samples soil with cement grout in the study about effects of the per- In recent decades, machine learning or artificial intelligent
meation cement grout with fly ash on the sandy soil skeleton (Ali and methods have been applied widely for generating such the prediction
Yousuf, 2016; Vanapalli and Fredlund, 2000). models of material properties (Shahin et al., 2009; Pham et al., 2017;
Many studies have been carried out for the prediction of the shear Pourghasemi and Rahmati, 2018; Shirzadi et al., 2017). Samui (2008)
strength of soft soils. Motaghedi and Eslami (2014) proposed an ana- applied Support Vector Regression (SVR) for predicting the friction
lytical approach for C, ϕϕ prediction using all quantities, qc, u2, and fs capacity of driven piles in clay soils. Behavior prediction of shallow
considering bearing capacity mechanism of failure at cone tip and di- foundations was also carried out using the Artificial Neural Network
rect shear failure along the penetrometer sleeve. McGann et al. (2015) (ANN) in several studies including bearing capacity (Kuo et al., 2009;
used a multiple linear regression to develop a Christchurch-specific Padmini et al., 2008). Chou et al. (2016) used data mining including
empirical correlation for predicting soil shear wave velocities (Vs) from linear regression, classification and regression tree (CART) analysis, a


Corresponding author at: Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam.
E-mail address: [email protected] (D. Tien Bui).

https://doi.org/10.1016/j.catena.2018.04.004
Received 2 February 2018; Received in revised form 9 March 2018; Accepted 3 April 2018
0341-8162/ © 2018 Elsevier B.V. All rights reserved.
B.T. Pham et al. Catena 166 (2018) 181–191

generalized linear (GENLIN) model, chi-squared automatic interaction 2.2.1. Shear strength
detection (CHAID), ANN, and SVR to identify factors influencing shear “Shear strength (τ) of a soil mass is the internal resistance per unit
strength and to predict the peak friction angle of FRS. In prediction of area that the soil mass can offer to resist failure and sliding along any
shear strength of soil, there are several studies. Das et al. (2011) studied plane inside it” (Das and Sobhan, 2013). It is an important factor in
the potential of the SVM and ANN for prediction of the residual strength analyzing the soil stability problems including slope stability, lateral
of soil. Kanungo et al. (2014) compared the ANN and CART techniques pressure on earth-retaining structures, and bearing capacity. The failure
for predicting the shear strength parameters. Kiran et al. (2016) applied of a soil mass is not due to either shear stress or maximum normal alone
Probabilistic Neural Network (PNN) to predict the shear strength and because of a critical combination of shearing stress and normal
parameters of soil, viz., cohesion “c” and internal friction angle “φ” stress (Das and Sobhan, 2013). Therefore, the functional relationship
from water content (w), Plasticity Index (PI), Dry Density (DD), Gravel between shear stress and normal stress on a failure plane of a soil mass
% (GP), Sand % (SP), Silt % (STP), and Clay % (CP) of soil. Prediction of can be presented as follows:
residual strength of clay based on a new prediction model namely
τ = f (σ ) = σ tan φ + c, (1)
functional network (FN) has been investigated in Khan et al. (2016). In
general, the common conclusion from the aforementioned works is that where σ (kg/cm ) is the normal stress on the failure plane, φ is the angle
3

machine learning methods are efficient for prediction of shear strength of internal friction, and c (kg/cm3) is the cohesion (Das and Sobhan,
of soft soils (Moavenian et al., 2016). 2013).
The recent development of machine learning and optimization have In the laboratory, the parameters of shear strength (c, φ) can be
resulted in some new promising soft computing methods i.e. Particle determined using different experiments namely direct shear test,
Swarm Optimization - Adaptive Network based Fuzzy Inference System triaxial test, and torsional ring shear test (Das and Sobhan, 2013;
(PANFIS), Genetic Algorithm - Adaptive Network based Fuzzy Inference Whitlow, 1990). In general, the determination of these parameters for
System (GANFIS). PANFIS and GANFIS are state-of-the art methods that calculating the shear strength of a soil mass is relatively complicated
were formed by integrating meta-heuristic optimization algorithms and and costly. In this study, suppose σ = 1kg/cm2, the shear strength was
neural fuzzy models. They have proven as the powerful tools in pre- calculated using the parameters (c, φ) determined by direct shear test
dicting various environmental problems such as flood (Bui et al., from 188 plastic clayed soil samples as follows:
2016a), forest fire (Bui et al., 2017), displacement of hydropower dam
τ = tan φ + c, σ = 1kg / cm2 . (2)
(Bui et al., 2016b), and landslide (Chen et al., 2017). On the other hand,
SVR and ANN are popular and efficient methods used in the shear Data of shear strength of 188 plastic clayed soil samples is shown in
strength modeling. However, investigation and comparison of these Fig. 2. It shows that τ values differ from 0.104 to 0.301 (kg/cm3), the
methods with popular machine learning methods i.e. Support Vector mean value is 0.197 (kg/cm3), and the standard deviation value is
Regression (SVR) and Artificial Neural Networks (ANN) for the pre- 0.047 (kg/cm3).
diction of the shear strength of soft soils have not been carried out.
In this study, we expand the body of knowledge thought in- 2.2.2. Moisture content
vestigating and comparing the prediction performance of PANFIS, “Moisture content (ω) is also referred to as water content and is
GANFIS, SVR, and ANN for the prediction of shear strength of soft soil. defined as the ratio of the weight of water to the weight of solids in a
The comparison of such the machine learning methods is significant for given volume of soil” (Das and Sobhan, 2013; Whitlow, 1990). Moisture
determination of an effective prediction model that can be used in content affects the shear strength of soil as it reduces the cohesive
practical scenarios of shear strength of soft soils. forces between soil solids, and even causes the saturation of soils. As the
The rest of the paper is organized as follows. Section 2 presents the moisture content increases the shear strength of soils reduces (Sharma
study sites and dataset description. Section 3 gives the background of and Bora, 2003). Thus, moisture content was taken into account as an
the models including PANFIS, GANFIS, SVR, and ANN. Sections 4 and 5 affecting factor for predicting of the shear strength of soils in this study.
demonstrate the results and discussion. Lastly, Section 5 draws con- Moisture content is determined in laboratory using an oven drying
clusions and suggests further studies. It is noted that MatlabR2014b and method or field test using alcohol burning method.
Weka 3.8.1 were used for dataset generation and modeling. Moisture content can be calculated using following equation (Das
and Sobhan, 2013; Whitlow, 1990):
2. Study site and data Wω mω g
ω (%) = × 100 = × 100,
Ws ms g (3)
2.1. Description of the study site
where Wω is the weight of water of soil sample, Ws is the weight of
In this research, plastic clay soil samples from two bridge con- solids of soil sample, mω is the mass of water of soil sample, ms is the
struction projects, the Nhat Tan Bridge (Ha Noi City) and the Cua Dai mass of the solids of soil sample, and g is the gravity acceleration
Bridge (Quang Nam City) in Vietnam were used as a case study. The (g = 9.81 m/s2). In this study, moisture content test was carried out in
Nhat Tan Bridge is located in about Latitude 20°50′30″N and Longitude the laboratory, and the moisture content values of 188 samples are
106°41′37″E, whereas the Cua Dai Bridge is located in Latitude shown in Fig. 3a. It shows that the moisture content values vary from
15°53′25″E and Longitude 108°20′42″E (Red points on the map in 24.19 to 141.83 (%), the mean value is 56.1 (%), and the standard
Fig. 1). The main beam system of the Nhat Tan Bridge was designed and deviation value is 19.1 (%).
constructed using cable-stayed structure with five diamond towers and
six spans. The whole length of the Cua Dai Bridge is 18.3 km, and the 2.2.3. Clay content
bridge part on the river is 1.481 km. Clays are classified as the soil solids smaller than 0.002 mm in size.
In several cases, the soil solids between 0.002 and 0.005 mm in size are
2.2. Data also considered as clays (Das and Sobhan, 2013). Clay content (μ) was
considered as an affecting factor to the shear strength of soils as it
A total of 188 samples from the two bridge projects were collected develops the plasticity of soils, and as the clay content increases the
and used for generating the datasets for modeling. In this prediction shear strength of soils reduces when soils are mixed with a limited
problem, the shear strength is the output variable whereas the input amount of water. Clay content can be determined in the laboratory
variables are moisture content, clay content, liquid limit, plastic limit, using grain size distribution analyzing test through following equation
plastic index, and consistency index. (Das and Sobhan, 2013):

182
B.T. Pham et al. Catena 166 (2018) 181–191

Fig. 1. Location of sample collection.

M0.005 test using Atterberg tools (Das and Sobhan, 2013). In this study, At-
μ (%) = × 100,
Msum (4) terberg test was carried out in the laboratory to determine the LL, and
the LL of 188 samples are shown in Fig. 3c. It shows that the LL values
where M0.005 is the mass of soil solids passing the 0.005 mm sieve in
vary from 25.17 to 147.08 (%), the mean value is 59.9 (%), and the
size and Msum is the total mass of the soil sample. In this study, grain
standard deviation value is 20.5 (%).
size distribution analyzing test was carried out in the laboratory to
determine, and the clay content of 188 samples are shown in Fig. 3b. It
shows that the clay content values differ from 11 to 87 (%), the mean 2.2.5. Plastic limit
value is 49.8 (%), and the standard deviation value is 17.2 (%). Plastic Limit (PL) is an Atterberg limit defined as the moisture
content at the point of transition from semisolid to plastic state (Das
2.2.4. Liquid limit and Sobhan, 2013). Plastic limit is related with the shear strength of
Liquid Limit (LL), which is known as one of the Atterberg limits, is soils as it increases the shear strength decreases (Sharma and Bora,
defined as the moisture content at the point of transition from plastic to 2003). It can be determined using the laboratory test using Atterberg
liquid state (Das and Sobhan, 2013). Liquid limit is related with the tools (Das and Sobhan, 2013). This limit can be calculated using fol-
shear strength of soils as it increases the shear strength decreases lowing equation:
(Sharma and Bora, 2003). This limit can be determined using the la- Wplastic
boratory test using Atterberg tools (Das and Sobhan, 2013). It can be PL (%) = × 100,
Ws (6)
calculated using following equation:
Wliquid where Wplastic is the weight of water of soil sample at the point of
LL (%) = × 100, transition from semisolid to plastic state determined from the labora-
Ws (5)
tory test using Atterberg tools (Das and Sobhan, 2013). In this study,
where Wliquid is the weight of water of soil sample at the point of Atterberg test was carried out in the laboratory to determine the PL, and
transition from plastic to liquid state determined from the laboratory the PL of 188 samples are shown in Fig. 3d. It shows that the PL values

Fig. 2. Shear strength of the soil samples.

183
B.T. Pham et al. Catena 166 (2018) 181–191

Fig. 3. Geotechnical properties of the soil samples:


(a) moisture content, (b) clay content, (c) liquid limit, and (d) plastic limit.

range from 13.31 to 99.33 (%), the mean value is 35.45 (%), and the dataset was used to test the models.
standard deviation value is 13 (%).
3. Background of the method used

2.3. Data preparation for modeling


3.1. Adaptive neural fuzzy inference system

In order to generate the datasets for modeling, the strength of soil


Adaptive neuro-fuzzy inference system (ANFIS) is a neuro-fuzzy
data is considered as dependent variable (Y) whereas other factors system that takes advantages of an ANN and a fuzzy system to construct
namely factors moisture content, clay content, liquid limit, plastic limit
a powerful and successful prediction model in many fields. ANFIS
are considered as independent variables X1, X2, X3, and X4, respectively. structure consists of 5 layers as follows (Fig. 4):
Data presentation is shown in the form of Table 1.
Data of the variables were divided into two parts such as training Layer 1: consists of input for the next layers. In this study, input
dataset (70%) and validating dataset (30%). Different dividing strate- layer consists of 276 thirteen-dimensional samples.
gies of data were carried out to get the best fit for each model and the Layer 2: This is an adaptive step. Membership value of each con-
statistical values of data used for each model are shown in Table 2. trolling factor is calculated based on membership functions Cji.
Training dataset was then used to learn models whereas validating Gaussian function was used as membership function as shown in
(Eq. 7). For each μC11(x1), there are two antecedent parameters to
Table 1
be tuned ci and δi
Data presentation.
(ci − x ) 2 ⎞
No. X1 X2 X3 X4 Y μC11 (x) = exp ⎛−⎜ ⎟ .
⎝ 2δi 2 ⎠ (7)
1 54.5 86.05 45.45 80.29 0.16
2 23.5 140.51 97.58 134.39 0.17
3 42.0 55.38 35.69 51.61 0.18 Layer 3: Preliminary weights were calculated in this layer by using
4 41.5 94.56 60.34 87.67 0.16
wi = μCi1 (x1) ∗μCi2 (x2)…∗μCim (xm). (8)
5 15.0 131.34 84.23 121.34 0.14
6 11.5 125.57 76.40 125.42 0.11
7 17.5 147.08 99.33 141.83 0.14 Layer 4:
8 32.0 97.22 58.80 86.98 0.16
… … … … … …
wi
wi = .
… … … … … … sum (wi ) (9)
… … … … … …
186 55.0 65.97 29.87 62.63 0.14
Layer 5:
187 20.5 77.32 41.05 71.36 0.14
188 20.5 109.70 61.43 99.22 0.14
fi = wi (a0 + ∑ (ai xi). (10)

184
B.T. Pham et al. Catena 166 (2018) 181–191

Table 2
Data generation and analysis.
No. Values Training dataset (70%) Testing dataset (30%)

PANFIS GANFIS SVR ANN PANFIS GANFIS SVR ANN

1 Minimum 0.104 0.104 0.104 0.104 0.104 0.104 0.104 0.104


2 Maximum 0.301 0.301 0.301 0.293 0.283 0.294 0.293 0.301
3 Mean 0.199 0.195 0.196 0.201 0.190 0.2 0.199 0.187
4 Standard deviation 0.048 0.048 0.045 0.044 0.044 0.046 0.053 0.053

Fig. 4. Basic structure of ANFIS.

Fig. 5. Methodology chart of the study.

Output value is generated as summation of fi and goes through Rule k: IF x1 is Ck1 AND x2 is Ck2….AND xm is Ckm THEN fk
defuzzification process to return final value. ANFIS uses the fuzzy rules m

in the following forms:


= a 0k + ∑i =1 a ik x i

where xi is controlling factors such as Clay content, Plastic limit, and


Liquid limit, Ck1 is the linguistic label, μCji(x) is membership value that

185
B.T. Pham et al. Catena 166 (2018) 181–191

3.3. Genetic algorithm

Genetic Algorithm (GA) is an optimization algorithm and search


engine based on the principles of genetic and natural selection (Johari
et al., 2011). GA starts with the creation of a group of solutions (po-
pulation of individuals) - each solution is represented by a chromosome
(Chromosome). Individuals in the population are used to create new
other individuals. This is done with the expectation that the new po-
pulation outperforms the old one. Individuals selected to create new
individuals - offspring - are selected based on their level of adaptation -
the higher adaptation the individuals are, the more likely they are used
to reproduce. This process is repeated until the conditions set are sa-
tisfied. Natural genetic processes used in this algorithm are: selection,
Fig. 6. RMSE analysis of the PANFIS and GANFIS. crossover, and mutation (Johari et al., 2011).

defines how much factor (x) belong to Cji, and ai is parameter of linear 3.4. Support vector regression
function to measure y.
Support Vector Regression (SVR) uses the same principles as SVR for
classification except a new type of loss function. Considering a regres-
3.2. Particle swarm optimization sion problem with a given training data set, expressed in a vector space,
where each material is a point. This method finds the best flat that can
Particle Swarm Optimization (PSO) is based on the idea of swarm divide the points in the space into two distinct classes, corresponding to
intelligence to find optimal solutions in a given search space (Poli et al., class + and class − (binary classification). The quality of this hyper-
2007). PSO is initialized by a random group of individuals and then plane is determined by the distance (called boundary) of the nearest
optimized by updating generations. In each generation, each individual data point of each layer to this plane. Therefore, the larger the
is updated by two best positions. The first value is the best position an boundary indicates the better the decision plane and the more accurate
individual has reached so far, called Pbest. The other optimal solution the classification. The goal of the SVR method is to find the maximum
that this individual pursues is the overall optimal solution of Gbest, boundary distance (Basak et al., 2007). In this study, we determine the
which is the best position in the entire search of the entire population values for SVR parameters through the trial-error process.
from the past to the present. In the other words, each individual of the
population updates their position according to their best position and
swarm's best position (Zhou et al., 2011). 3.5. Artificial Neural Networks
In this research, Root Mean Square Error (RMSE) is used as the ob-
jective function. The lower the RMSE indicates the more accuracy the Artificial Neural Networks (ANN) is a popular machine learning
model. The PSO algorithm will evaluate the objective function to de- techniques which is based on biologically the process of information of
termine if the criteria are met or not. the human brain. It gives the decision by detecting and analyzing the
relationships and patterns in data itself (Behrang et al., 2010). In this
n
study, multi-layered perceptron neural network was selected as a re-
RMSE = Sqrt ∑ ((pri − yi )2 / n),
(11) gression method for prediction of strength of soft soils. Using the sig-
i=1
moid function, the neurons compute the weights of the inputs using the
where pri is the predicted value from the model, yi is the measure of the activation function:
shear strength of the soil, n is the number of input data. 1
If the criteria at position xi are not met, the next position will be yj = f j (x ) = ,
1 + e−x (14)
generated with another velocity of a particle. The formulas are as
follow: where x = (x1, x2, …, xk) are inputs (landslide influencing factors) and yj
are the outputs (landslide or non-landslide variables).
vi k + 1 = ωvi + ac1 r1 (pbesti − x i ) + ac2 r2 (gbesti − x i ), (12)

3.6. Quality assessment


x i k + 1 = x i + vi k + 1 (13)
The accuracy of a model is assessed by Root Mean Square Error
where Xik is the position of individual i at generation k, Vik is the ve-
(RMSE) and R (correlation coefficient). These three indicators are
locity of individual i at generation k, Xik+1 is the position of individual i
popular in model validation. The formulas are as follow:
at generation k + 1, Vik+1 is the velocity of individual i at generation
k + 1, Pbest is best location of individual i in the swarm, and Gbest is n

the best location of the all individuals in the swarm. If the criteria are RMSE = Sqrt ∑ ((pri − yi )2 / n),
i=1 (15)
met or the model reaches the iteration, the algorithm stops.

Table 3
Validation of the PANFIS with different values of initial weight.
Validation criteria Initial weight

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R 0.59 0.591 0.554 0.601 0.593 0.538 0.588 0.598 0.533 0.533
RMSE 0.0365 0.0364 0.0374 0.034 0.0361 0.0392 0.0363 0.0342 0.0359 0.0359

186
B.T. Pham et al. Catena 166 (2018) 181–191

Table 4
Validation of the GANFIS with different values of Gamma.
Validation criteria Gamma

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R 0.538 0.572 0.533 0.5777 0.582 0.45 0.59 0.248 0.513 0.543
RMSE 0.037 0.0352 0.0359 0.035 0.0354 0.038 0.035 0.194 0.0366 0.0356

Table 5
Validation of the SVR with different values of Gamma.
Validation criteria Gamma

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R 0.549 0.5341 0.5333 0.519 0.4858 0.4719 0.4356 0.3905 0.3457 0.329
RMSE 0.044 0.0445 0.0444 0.0449 0.0461 0.0467 0.048 0.0495 0.051 0.0517

Table 6
Validation of the ANN with different numbers of hidden neurons.
Validation criteria Number of hidden neurons

1 2 3 4 5 6 7 8 9 10

R 0.3226 0.36 0.1594 0.4754 0.3992 0.4935 0.4681 0.4473 0.4385 0.36
RMSE 0.0545 0.0534 0.0614 0.0564 0.0519 0.0472 0.05 0.0648 0.0675 0.0645

Fig. 7. RMSE analysis of the models using the training dataset:


(a) PANFIS, (b) GANFIS, (c) SVR, and (d) ANN.

n
Using training dataset, the four models, PANFIS, GANFIS, SVR, and
∑ (pri − pr )(yi − y )
i=1 ANN were trained and constructed for predicting of strength of soils.
R= ,
n n For the PANFIS, an initial FIS model was first generated with initial
∑ (pri − pr )2 ∑ (yi − y ) parameters, in which, a FIS structure was created based on a number of
i=1 i=1 (16)
membership functions. Thereafter, the PSO is then used to search the
where yi and y are respectively the measure and mean values of soil most suitable antecedent parameters and consequent parameters for
shear strength, pri and pr are output values from the model. training the ANFIS. Fitness function (RMSE) was used evaluate the
performance of the model for 500 iterations (Fig. 6).
In learning the PSO, initial parameter such as inertia weight was set
4. Results and analysis
as “0.4” to give the best RMSE (Table 3). Finally, the PANFIS model was
constructed as the stopping criteria or the RMSE is optimized. In term of
Methodological flow chart of this research is shown in Fig. 5.

187
B.T. Pham et al. Catena 166 (2018) 181–191

Fig. 8. RMSE analysis of the models using the testing dataset:


(a) PANFIS, (b) GANFIS, (c) SVR, and (d) ANN.

GANFIS, the process is similar to the PANFIS; however, the GA was used In term of the R analysis, the results of the training dataset (Fig. 9)
to find the most suitable antecedent parameters and consequent para- indicate that the R values of four models vary from 0.637 to 0.817
meters for training the ANFIS instead of the PSO. In learning the GA, indicating that all four models has a goodness of fit with the data used;
initial parameters namely crossover percentage, mutation percentage, however, the RANFIS (R = 0.817) has the highest goodness of fit, fol-
Gama, mutation rate were selected as “0.4”, “0.7” (Table 4), “0.7”, and lowed by the GANFIS (R = 0.654), the SVR (R = 0.64), and the ANN
“0.5”, respectively. Fitness function (RMSE) in the GA was used for 500 (R = 0.637), respectively. For the testing dataset (Fig. 10), out of four
iterations (Fig. 6). models the PANFIS (R = 0.601), GANFIS (R = 0.569), and SVR
The stopping criteria or the RMSE is applied to construct the final (R = 0.549) have acceptable capability for predicting the strength of
GANFIS. For the SVR, kernel function of RBF is used to train the model soft soils, and the ANN (R = 0.49) has a bit poor capability in this
especially learning parameters such as gamma, nu selected as “0.1” study.
(Table 5), and “0.5” (Thomas et al., 2017), respectively. Regarding the
ANN, the artificial network is constructed with 4 input neurons, 6 5. Discussion
hidden neurons, and 1 output neuron (Table 6). A trial-and-error test is
applied to determine the values of initial parameters of the models. Determination of shear strength of soft soil is important task for
Validation was carried out to test the performance of the models for audit and design of geotechnical structures and constructions. On the
prediction of strength of soils using different criteria such as RMSE and other hand, the experiment of shear strength is time-consuming and
R. Both training and validating datasets were used in this task. While needs costly laboratory equipment (Vanapalli et al., 1996). Thus, pre-
training dataset was used to test the goodness of fit of the models with diction of shear strength using advanced machine learning techniques is
data used, validating dataset was used to validate the predictive cap- effective solution for quickly determination and low cost experiment.
ability of the models. Only few studies have been done to predict the properties of soft soil
Validation and comparison of the models have been done using using machine learning techniques (Chou et al., 2016; Samui, 2008).
RMSE and R criteria, and the results are shown in Figs. 7, 8, 9, and 10. Moreover, the prediction of shear strength of soft soil using these
According to the validation results using RMSE criteria (Figs. 7 and 8), techniques is still limited and required more advanced techniques for
it can be observed that the RMSE values of the models varies from 0.027 better predictive capability. In this study, four advanced machine
to 0.0359 for the training dataset which are smaller than standard de- learning methods PANFIS, GANFIS, SVR, and ANN were used and ap-
viation of the training dataset used for the corresponding models plied for better prediction of the shear strength of soft soil.
(Table 2) indicating that all models have good performance; however, Based on the analysis of validation results of the models, it can be
the PANFIS has the highest value of RMSE compared with other models observed that out of four models the PANFIS, GANFIS, and SVR has
(GANFIS, SVR, and ANN) indicating that the PANFIS has the best acceptable capability for the prediction of the strength of soft soils
goodness of fit with the data used compared with other models. while the ANN has a slightly poor performance in this study. However,
Similarly, RMSE values of PANFIS, GANFIS, SVR, and ANN are the PANFIS has the highest performance, followed by the GANFIS, the
0.038, 0.04, 0.044, and 0.047, respectively which are smaller than SVR, and the ANN, respectively. It can be seen the reasonability of the
standard deviation of the testing dataset used for the corresponding obtained results is that the PANFIS and GANFIS used PSO and GA op-
models. Those results indicate that these four models perform well for timization techniques which can help in reducing the RMSE of pre-
prediction of strength of soft soils in this study but the PANFIS out- diction; however, the PSO is more effective than the GA.
performs the other models (GANFIS, SVR, and ANN). In term of the comparison results of the SVR and ANN, the

188
B.T. Pham et al. Catena 166 (2018) 181–191

Fig. 9. Correlation results analysis of the models using the training dataset:
(a) PANFIS, (b) GANFIS, (c) SVR, and (d) ANN.

optimization used to solve the constrained quadratic programming models used. In this study, these four models indicated acceptable
function in the SVR is more optimal and global than the one in the ANN capability of prediction; however, their capability might be improved
(Samui, 2008). Moreover, the SVR is better than the ANN in general- by providing more number of data so that the models might be more
ization capability as it has ability to deal with overtraining problems. In regressive and by applying over-sampling or under-sampling methods
addition, the ANN is controlled by many parameters (number of hidden (He et al., 2008) to deal with imbalanced data sets. In addition, the use
layers, learning rate, number of training epochs, number of hidden of different combination of inputs might give the different prediction
nodes, momentum term, weight initialization techniques, and transfer outcomes of the models which should be taken into account for further
functions) which is difficult to be optimized simultaneously during studies.
learning model (Samui, 2008). This result is in agreement with another In fact, soil is very complicated material which is not easy to predict
study carried out by Das et al. (2011) who stated that the SVM model is their properties. Even so, determination of their properties in labora-
better than the ANN models in prediction of residual strength of soft tory is sometimes not very much accurate due to many affecting factors
soil. Although the ANN has the lowest predictive capability in predic- (namely experimental conditions, equipment, experience of testers,
tion of shear strength of soil in this study, its potential has been proven etc.). In this study, the advanced machine learning models predicted the
by Kanungo et al. (2014) who stated that the ANN is a promising shear strength of soft soil with average error rates (4.2%), which are
method for prediction of soil shear strength parameters. lower than standard deviation, are acceptable for geotechnical pro-
Even though machine learning techniques such as PANFIS, GANFIS, blems. Thus, these machine learning techniques might also be used to
SVR, and ANN are advanced methods in prediction problems, their predict other properties of soft soil.
performance depends significantly on the quality of data used (Mair
et al., 2000). In geotechnical problems, the use of variables determined
from various experiments on various samples of the same soils can 6. Conclusion
cause the bias of outcomes which can affect the performance of the
This research investigated and compared the prediction

189
B.T. Pham et al. Catena 166 (2018) 181–191

Fig. 10. Correlation results analysis of the models using the testing dataset:
(a) PANFIS, (b) GANFIS, (c) SVR, and (d) ANN.

performance of the four machine learning methods PANFIS, GANFIS, researches should consider newer algorithms for optimizing the ANFIS
SVR, and ANN for predicting the strength of soft soils. The plastic clay model such as the fuzzy clustering (Son et al., 2011, 2012a, 2012b,
soil data were provided from the two projects, the Nhat Tan and the 2013; Son, 2014, 2015; Wijayanto et al., 2016), and newer machine
Cua Dai bridges in Viet Nam. PANFIS and GANFIS are relative new learning algorithms i.e. Bagging framework (Pham et al., 2018a), en-
fuzzy inference systems that have rarely explored for predicting the sembles (Chen et al., 2018), and advanced decision trees (Khosravi
strength of soft soil, whereas SVR and ANN are popular and efficient et al., 2018; Pham et al., 2018b).
machine learning in soil strength modeling. The result shows that the Despite the limitation, the results of this study is helpful for geo-
prediction quality of the strength of soft soils is strongly influenced by technical engineer to predict the strength of soft soils for carrying out
the method used. Among the four models, PANFIS and GANFIS have the the audit of geotechnical structures and constructions in practice as the
highest prediction performance; therefore we conclude that PANFIS input variables such as CC, W, LL, and PL are available. It will also help
and GANFIS are valid tools for predicting the strength of soft soils. to reduce the cost of construction due to reduction of the cost of la-
The main advantage of PANFIS and GANFIS is that the two models boratorial experiments.
were constructed and then optimized by two meta-heuristic optimiza-
tion algorithms, PSO and GA, autonomously. Therefore, these may Acknowledgements
guarantee that the parameters of the inference rules for predicting the
strength of soft soil of the two models are optimized. Among the two The authors are greatly indebted to Prof. Dao Van Dong, Rector and
models, PANFIS and GANFIS, the PANFIS performed better. This is the GEOAI group, University of Transport Technology, Vietnam for
because PSO has a strong global search ability with a quick convergence your ultimate supports of this research.
(Zhou et al., 2011). As a result, PSO searched and found the optimized
parameters better than that of GA in the GANFIS model. References
The limitation of this research is that only two meta-heuristic op-
timization algorithms, PSO and GA were investigated. Therefore, future Ali, H.A., Yousuf, Y.M., 2016. Improvement of shear strength of sandy soil by cement

190
B.T. Pham et al. Catena 166 (2018) 181–191

grout with fly ash. J. Eng. 22, 16–34. Christchurch soils from cone penetration test data. Soil Dyn. Earthq. Eng. 75, 66–75.
Azari, B., Fatahi, B., Khabbaz, H., 2014. Assessment of the elastic-viscoplastic behavior of Moavenian, M., Nazem, M., Carter, J., Randolph, M., 2016. Numerical analysis of pe-
soft soils improved with vertical drains capturing reduced shear strength of a dis- netrometers free-falling into soil with shear strength increasing linearly with depth.
turbed zone. Int. J. Geomech. 16, B4014001. Comput. Geotech. 72, 57–66.
Basak, D., Pal, S., Patranabis, D.C., 2007. Support vector regression. Neural Inf. Process. Motaghedi, H., Eslami, A., 2014. Analytical approach for determination of soil shear
Lett. Rev. 11, 203–224. strength parameters from CPT and CPTu data. Arab. J. Sci. Eng. 39, 4363–4376.
Behrang, M., Assareh, E., Ghanbarzadeh, A., Noghrehabadi, A., 2010. The potential of Oliveira, P.J.V., Correia, A.A., Lemos, L.J., 2017. Numerical prediction of the creep be-
different artificial neural network (ANN) techniques in daily global solar radiation haviour of an unstabilised and a chemically stabilised soft soil. Comput. Geotech. 87,
modeling based on meteorological data. Sol. Energy 84, 1468–1480. 20–31.
Bui, D.T., et al., 2016a. Hybrid artificial intelligence approach based on neural fuzzy Padmini, D., Ilamparuthi, K., Sudheer, K., 2008. Ultimate bearing capacity prediction of
inference model and metaheuristic optimization for flood susceptibility modeling in a shallow foundations on cohesionless soils using neurofuzzy models. Comput.
high-frequency tropical cyclone area using GIS. J. Hydrol. 540, 317–330. Geotech. 35, 33–46.
Bui, K.-T.T., Bui, D.T., Zou, J., Van Doan, C., Revhaug, I., 2016b. A novel hybrid artificial Pham, B.T., Bui, D.T., Prakash, I., Dholakia, M.B., 2017. Hybrid integration of Multilayer
intelligent approach based on neural fuzzy inference model and particle swarm op- Perceptron Neural Networks and machine learning ensembles for landslide suscept-
timization for horizontal displacement modeling of hydropower dam. Neural ibility assessment at Himalayan area (India) using GIS. Catena 149, 52–63.
Comput. & Applic. 1–12. http://dx.doi.org/10.1007/s00521-016-2666-0. Pham, B.T., Bui, D.T., Prakash, I., 2018a. Bagging based support vector machines for
Bui, D.T., et al., 2017. A hybrid artificial intelligence approach using GIS-based neural- spatial prediction of landslides (Environ. Earth Sci.). 77 (4), 146.
fuzzy inference system and particle swarm optimization for forest fire susceptibility Pham, B.T., Prakash, I., Bui, D.T., 2018b. Spatial prediction of landslides using a hybrid
modeling at a tropical area. Agric. For. Meteorol. 233, 32–44. machine learning approach based on random subspace and classification and re-
Chen, W., Panahi, M., Pourghasemi, H.R., 2017. Performance evaluation of GIS-based gression trees. Geomorphology 303, 256–270.
new ensemble data mining techniques of adaptive neuro-fuzzy inference system Poli, R., Kennedy, J., Blackwell, T., 2007. Particle swarm optimization. Swarm Intell. 1,
(ANFIS) with genetic algorithm (GA), differential evolution (DE), and particle swarm 33–57.
optimization (PSO) for landslide spatial modelling. Catena 157, 310–324. Pourghasemi, H.R., Rahmati, O., 2018. Prediction of the landslide susceptibility: which
Chen, W., Xie, X., Peng, J., Shahabi, H., Hong, H., Bui, D.T., ... Zhu, A.X., 2018. GIS-based algorithm, which precision? Catena 162, 177–192.
landslide susceptibility evaluation using a novel hybrid integration approach of bi- Samui, P., 2008. Prediction of friction capacity of driven piles in clay using the support
variate statistical based random forest method. Catena 164, 135–149. vector machine. Can. Geotech. J. 45, 288–295.
Chou, J.-S., Yang, K.-H., Lin, J.-Y., 2016. Peak shear strength of discrete fiber-reinforced Shahin, M.A., Jaksa, M.B., Maier, H.R., 2009. Recent advances and future challenges for
soils computed by machine learning and metaensemble methods. J. Comput. Civ. artificial neural systems in geotechnical engineering applications. Adv. Artif. Neural
Eng. 30, 04016036. Syst. 2009, 5.
Das, B.M., Sobhan, K., 2013. Principles of Geotechnical Engineering. Cengage Learning. Sharma, B., Bora, P.K., 2003. Plastic limit, liquid limit and undrained shear strength of
Das, S., Samui, P., Khan, S., Sivakugan, N., 2011. Machine learning techniques applied to soil—reappraisal. J. Geotech. Geoenviron. 129, 774–777.
prediction of residual strength of clay. Open Geosci. 3, 449–461. Shirzadi, A., Shahabi, H., Chapi, K., Bui, D.T., Pham, B.T., Shahedi, K., Ahmad, B.B., 2017.
Griffiths, S.C., Cox, B.R., Rathje, E.M., 2016. Challenges associated with site response A comparative study between popular statistical and machine learning methods for
analyses for soft soils subjected to high-intensity input ground motions. Soil Dyn. simulating volume of landslides. Catena 157, 213–226.
Earthq. Eng. 85, 1–10. Son, L.H., 2014. Enhancing clustering quality of geo-demographic analysis using context
He, H., Bai, Y., Garcia, E.A., Li, S., 2008. ADASYN: Adaptive synthetic sampling approach fuzzy clustering type-2 and particle swarm optimization. Appl. Soft Comput. 22,
for imbalanced learning, Neural Networks, 2008. In: IJCNN 2008. (IEEE World 566–584.
Congress on Computational Intelligence). IEEE International Joint Conference on. Son, L.H., 2015. A novel kernel fuzzy clustering algorithm for geo-demographic analysis.
IEEE, pp. 1322–1328. Inf. Sci. 317 (C), 202–223.
Johari, A., Javadi, A., Habibagahi, G., 2011. Modelling the mechanical behaviour of Son, L.H., et al., 2011. Developing JSG framework and applications in COMGIS project.
unsaturated soils using a genetic algorithm-based neural network. Comput. Geotech. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 3, 108–118.
38, 2–13. Son, L.H., Cuong, B.C., Lanzi, P.L., Thong, N.T., 2012a. A novel intuitionistic fuzzy
Kanungo, D., Sharma, S., Pain, A., 2014. Artificial Neural Network (ANN) and Regression clustering method for geo-demographic analysis. Expert Syst. Appl. 39 (10),
Tree (CART) applications for the indirect estimation of unsaturated soil shear 9848–9859.
strength parameters. Front. Earth Sci. 8, 439–456. Son, L.H., Lanzi, P.L., Cuong, B.C., Hung, H.A., 2012b. Data mining in GIS: a novel
Kaya, A., 2009. Residual and fully softened strength evaluation of soils using artificial context-based fuzzy geographically weighted clustering algorithm. Int. J. Mach.
neural networks. Geotech. Geol. Eng. 27, 281–288. Learn. Comput. 2 (3), 235.
Khan, S., Suman, S., Pavani, M., Das, S., 2016. Prediction of the residual strength of clay Son, L.H., Cuong, B.C., Long, H.V., 2013. Spatial interaction–modification model and
using functional networks. Geosci. Front. 7, 67–74. applications to geo-demographic analysis. Knowl.-Based Syst. 49, 152–170.
Khosravi, K., Pham, B.T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., Prakash, I., Bui, Thomas, S., Pillai, G., Pal, K., 2017. Prediction of peak ground acceleration using ϵ-SVR, ν-
D.T., 2018. A comparative assessment of decision trees algorithms for flash flood SVR and Ls-SVR algorithm. Geomatics Nat. Hazards Risk 8, 177–193.
susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 627, Vanapalli, S., Fredlund, D., 2000. Comparison of different procedures to predict un-
744–755. saturated soil shear strength. Adv. Unsaturated Geotech. 195–209.
Kiran, S., Lal, B., Tripathy, S., 2016. Shear strength prediction of soil based on prob- Vanapalli, S., Fredlund, D., Pufahl, D., Clifton, A., 1996. Model for the prediction of shear
abilistic neural network. Indian J. Sci. Technol. 9. strength with respect to soil suction. Can. Geotech. J. 33, 379–392.
Kuo, Y., Jaksa, M., Lyamin, A., Kaggwa, W., 2009. ANN-based model for predicting the Whitlow, R., 1990. Basic Soil Mechanics.
bearing capacity of strip footing on multi-layered cohesive soil. Comput. Geotech. 36, Wijayanto, A.W., Purwarianti, A., Son, L.H., 2016. Fuzzy geographically weighted clus-
503–516. tering using artificial bee colony: an efficient geo-demographic analysis algorithm
Mair, C., et al., 2000. An investigation of machine learning based prediction systems. J. and applications to the analysis of crime behavior in population. Appl. Intell. 44 (2),
Syst. Softw. 53, 23–29. 377–398.
McGann, C.R., Bradley, B.A., Taylor, M.L., Wotherspoon, L.M., Cubrinovski, M., 2015. Zhou, D., et al., 2011. Randomization in particle swarm optimization for global search
Development of an empirical correlation for predicting shear wave velocity of ability. Expert Syst. Appl. 38, 15356–15364.

191

You might also like