0% found this document useful (0 votes)

25 views14 pages

A Random Forest Model of Landslide Susceptibility Mapping Based On Gyoeroarameter Optimization Using Bayes Algorithm

Uploaded by

conminamb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views14 pages

A Random Forest Model of Landslide Susceptibility Mapping Based On Gyoeroarameter Optimization Using Bayes Algorithm

Uploaded by

conminamb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Geomorphology 362 (2020) 107201

Contents lists available at ScienceDirect

Geomorphology

journal homepage: www.elsevier.com/locate/geomorph

A random forest model of landslide susceptibility mapping based on

hyperparameter optimization using Bayes algorithm
Deliang Sun a, Haijia Wen b,c,d,⁎, Danzhou Wang a, Jiahui Xu a
a
The Key Laboratory of GIS Application Research, Chongqing Normal University, Chongqing 401331, China
b
Key Laboratory of New Technology for Construction of Cities in Mountain Area, Ministry of Education, Chongqing 400045, China
c
National Joint Engineering Research Center of Geohazards Prevention in the Reservoir Areas, Chongqing 400044, China
d
School of Civil Engineering, Chongqing University, Chongqing 400045, China

a r t i c l e i n f o a b s t r a c t

Article history: The choice of model parameters in landslide susceptibility mapping makes a major determinant of model accu-
Received 26 November 2019 racy. The purpose of this study is to optimize the hyperparameters based on a Bayesian optimization algorithm,
Received in revised form 5 April 2020 and to obtain a high accuracy random forest landslide susceptibility evaluation model. The research steps are de-
Accepted 5 April 2020
tailed as follows. Firstly, taking a typical landslide prone mountainous area as an example, 16 conditioning fac-
Available online 12 April 2020
tors, such as elevation, annual average rainfall, distance from roads, distance from buildings and so on, were
Keywords:
preliminarily selected as the conditioning factors of landslide susceptibility. Combined with 1520 historical land-
Bayes algorithm slide events, a geospatial database was established with 30 m resolution. Secondly, the geospatial data sample set
Random forest was constructed by random sampling according to ratio of historical landslides and non-landslides of 1:10. Based
Landslide susceptibility mapping on the whole sample set, the random forest model adopted the Bayesian optimization algorithm to optimize the
Hyperparameter optimization hyperparameters. Next, the optimal hyperparameters were selected to be trained to get the evaluation model of
Factor screening landslide susceptibility. In addition, they were carried out the analysis of landslide susceptibility mapping for the
whole study area. After that, the recursive feature elimination method was used to screen out the dominant con-
ditioning factors that can explain the degree of landslide susceptibility. The results indicated that the area under
curve (AUC) values of receiver operating characteristic (ROC) curve in training data set, veriﬁcation data set and
regional simulation were 0.95, 0.87 and 0.93, respectively. 65% of the historical landslides fell between the high
susceptibility and very high susceptibility regions, which made up b20% of the research area. The model was in
good agreement to the distribution characteristics of historical landslides in the study area. We noted that all
the three recent landslides with impact on the study area occurred at the locations predicted by the model to
have high or very high susceptibility in terms of typical landslides in the near future. As for conditioning factors,
the contribution related to human activities accounted for a large proportion. In conclusion, an evaluation model
with high precision for random forest landslide susceptibility can be built based on hyperparameter optimization
with Bayesian optimization algorithm. Simultaneously, using recursive feature elimination method, a random
forest landslide susceptibility model with fewer dominant conditioning factors and guaranteed evaluation accu-
racy can also be built to save the running time and input data resources of the model.
© 2020 Elsevier B.V. All rights reserved.

1. Introduction losses in terms of damages of houses, crops, etc. Besides, as Froude

and Petley (2018), from 2004 to 2016, there were 4862 landslides in
Landslides are one of the most frequent geological hazards all total with varying impacts around the world, which resulted in 55,997
around the world, having the characteristics of wide distribution, high deaths. Every year, the number of new landslide disaster sites continues
frequency, fast movement speed, serious disaster losses and so on (Yin to grow rapidly. Therefore, how to avoid and prevent new landslide di-
and Zhu, 2001). According to natural hazard-related disaster reports sasters has become a key issue for local government departments and
(UNOCHA, 2019), from August 6 to 13, 2019, successive landslides oc- research institutes. As previous studies indicate, landslide susceptibility
curred in many countries of Asia and Pacific region, which caused mul- assessment is an effective solution. It is of great significance for landslide
tiple deaths and disappearances and resulted in significant economic prevention and control and urban construction planning to focus on
researching into landslide factors in high susceptibility areas.
⁎ Corresponding author at: Key Laboratory of New Technology for Construction of Cities
There has been much research on methods of landslide susceptibil-
in Mountain Area, Ministry of Education, Chongqing 400045, China. ity, a considerable proportion of which adopted statistical analyses in
E-mail address: [email protected] (H. Wen). the early stage. However, influenced by the complex nonlinear

https://doi.org/10.1016/j.geomorph.2020.107201
0169-555X/© 2020 Elsevier B.V. All rights reserved.
2 D. Sun et al. / Geomorphology 362 (2020) 107201

characteristics of landslide development, many problems such as factor LSM conditioning factors and algorithm model deserves our study.
selection, parameter optimization and sample optimization of the Most of the existing research of hyperparameter optimization was
model in the evaluation of regional landslide susceptibility have not seen in the field of computer algorithm. For example, Wang et al.
been solved systematically. With the development of Geographic Infor- (2014) proposed a hyperparameter selection method of support vector
mation System and Artificial Intelligence (AI) technology, various ma- machine based on Gaussian kernel, which can be divided into two
chine learning methods have begun to be used in research, including stages: selecting kernel parameters and training optimal penalty factors.
Logistic, Classification and Regression Tree (CART), Support Vector Ma- The calculation complexity of this method is low, the classification accu-
chine (SVM) and so on. With strong robustness against over fitting, racy is high and the training time is reasonable; Kang et al. (2019) pro-
these algorithms are suitable for the nonlinear relationship of variables posed a non-inertial particle swarm optimization with elite mutation-
and available for natural modeling of nonlinear decision boundary. Ear- Gaussian process regression (NIPSO-GPR) to optimize the
lier studies are mainly based on a single model with limited accuracy hyperparameters of GPR; Wan et al. (2010) proposed a simple, practical
and over-fitting concerns to predict landslide susceptibility. To avoid and time-effective method to select the hyperparameters of orthogonal
such issues, a random forest model combining multiple decision trees design. Two kinds of Support Vector Machine (SVM) models of typical
is proposed to improve the prediction accuracy. The model contains landslide displacement time series were designed by the combination
multiple decision trees, and the output results are determined by the of hyperparameters and orthogonal optimization, and a landslide pre-
mode of different types of decision trees. Compared to traditional diction model with high accuracy and good generalization performance
methods like logistical regression, it has certain advantages (Cao, was obtained. However, the problem of hyperparameter optimization is
2014). Specifically, the model is capable of handling datasets with rarely considered in the study of landslide susceptibility using random
higher dimensions and larger data volume and has greater generaliza- forest model. Another reason that affects the accuracy of the model
tion ability. may be the selection of samples. The selection of different training sam-
While studies on biological information (Chen and Liu, 2005; Pang ples will influence the accuracy of the model greatly, and will make
et al., 2010), medical science (Ying et al., 2008; Xie et al., 2009), business training results inconsistent with the facts. In the study of how to select
management (Ward et al., 2006; Kim et al., 2010) and other fields have training samples, Liu et al. (2020) proposed a method of selecting train-
already adopted the multiple decision trees model to achieve more so- ing samples based on soil type classification, using the random forest
phisticated research results, the research on landslide prediction started model to update the soil map.
late. Based on grid data with resolution of 25 m and using 14 condition- As the most typical mountainous county in Western China and the
ing factors including elevation, slope, aspect, vegetation index, lithology, Three Gorges Reservoir area, Fengjie was chosen as an example for re-
etc., Hong et al. (2016) implemented the evaluation and validation on search in the present study. The mountainous area features frequent
the landslide susceptibility in Lianhua County, China with random forest landslides resulted from the influence of migration, water storage and
model. Besides, comparison with other traditional statistical models power generation, continuous precipitation, etc. in the reservoir area.
(Evidence Belief Function EBF, Logistic Regression LR, Frequency Ratio Reservoir migration, in particular, intensifies construction activities
FR) have been implemented as well; Chen et al. (2017a), based on the which have resulted in the reconstruction of natural slope. By using
grid data of 30 m resolution, applied the random forest method to the Bayesian optimization algorithm, hyperparameter optimization and
spatial prediction of landslide susceptibility in Long County, China and dominant conditioning factor screening analysis, an efficient random
compared it with other advanced machine learning algorithms (Logistic forest evaluation model of landslide susceptibility was constructed,
Model Tree LMT, Classification And Regression Tree CART, Random For- and reliability evaluation and application verification were carried out.
est RF); Yu et al. (2016) selected 12 conditioning factors from three cat- The methodology of the study is illustrated in Fig. 1. Using satellite
egories, i.e., topography, meteorology, hydrology and soil vegetation. image, DEM, geological data and other multi-source data, 16 landslide
Based on the grid data with a resolution of 30 m, the relationship be- susceptibility conditioning factors were extracted, a geospatial database
tween the occurrence of landslides and the conditioning factors of land- was constructed, and random samples were selected based on historical
slides in Shunchang area of Fujian Province was empirically analyzed by landslide points. Bayesian optimization algorithm was used to select the
using the random forest model, and the applicability of the random for- optimal hyperparameters, which trained and tested the random forest
est model in the spatial prediction of landslides in South China was model to produce the landslide susceptibility assessment map of the
discussed. Based on the grid data of 100 m resolution and 15 condition- study area. Next, the comparison and analysis of the recent landslide
ing factors such as elevation, slope and aspect, Taalab et al., 2018used cases were carried out to verify the effectiveness of the random forest
the random forest model to evaluate the landslide susceptibility and landslide susceptibility model after the hyperparameter optimization.
predict the landslide viewing space in the piedmont of northwest Finally, the importance of conditioning factors and the impact law of
Italy, based on the grid data of 100 m resolution and 15 conditioning fac- typical conditioning factors were analyzed. The dominant conditioning
tors such as elevation, slope and aspect. factors which can be used to evaluate the susceptibility of landslides
The susceptibility of landslides is subject to the comprehensive effect were screened out by the recursive feature elimination method, so as
by a variety of conditioning factors. Reichenbach et al. (2018) analyzed to build a random forest landslides susceptibility model which
the studies on landslide susceptibility assessment published from 1983 contained fewer dominant conditioning factors but maintained good
to 2016 and found that 596 conditioning factors were examined in land- evaluation accuracy.
slide susceptibility assessment with an average of 9 conditioning factors
per model. The number of conditioning factors selected in most models 2. Case research area overview and data sources
was not large; besides, in most cases these factors were selected subjec-
tively according to the experience of experts. Study on how to choose 2.1. Research area overview
the dominant conditioning factors objectively was rarely seen in rele-
vant literature. With its position at 109°1′17″–109°45′58″E and 30°29′19″–31°22′
The accuracy of the model depends on not only the learning algo- 33″N (Fig. 2), Fengjie County is located in the research area which rep-
rithm but also the hyperparameters (i.e., parameters for setting values resents the east gate of Chongqing. The mountainous area with complex
before starting the learning process), which makes it necessary to opti- tectonic stress field is located in the east of Sichuan Basin, which is the
mize the model. The existing LSM research literature has paid more at- intersection of Dabashan arc fold fault zone, East Sichuan arc concave
tention to the comparison of accuracy of modeling with different fold zone and Sichuan, Hubei, Hunan and Guizhou Uplift fold zone.
methods instead of the application of hyperparameter optimization in The climate here is the Central Asian tropical humid monsoon climate,
landslide machine learning modeling. In that case, the optimization of with abundant rainfall and annual average precipitation of 1132 mm.
D. Sun et al. / Geomorphology 362 (2020) 107201 3

Fig. 1. Schematic representation of computational methodology.

There are numerous river systems in the region and 17 river basins with 10.05% were caused by ground water (pore water), 2.01% by human
an area of more than 50 km2. The Yangtze River runs through the central construction activities and 12.06% by coupling.
part of the region, with an average annual discharge of 13,700 m3/s.
3. Geospatial databases

2.2. Data sources 3.1. Conditioning factors

The data of 1520 historical landslides in Fengjie County from 2001 to The formation mechanism of landslide is very complex and the sus-
2016 and the related conditioning factors of landslide formation were ceptibility of landslide is jointly affected by natural factors and human
collected, sorted and organized as shown in Table 1 below. activities. Reichenbach et al. (2018) analyzed the studies related to
The historical landslide data were sorted out on two bases, i.e., type landslide sensitivity evaluation from 1983 to 2016 and concluded that
and trigger. (Fig. 3). In terms of type, it can be found that most of the there were 596 factors for landslide sensitivity evaluation, i.e., 9 factors
landslides in the study area were small/shallow/soil ones (82%) and on average was used by each model, which shows that not many factors
only 18% were large/deep/bedrock landslides. In terms of trigger, most are considered in most models. According to Ayalew and Yamagishi
of the landslides in the study area were caused by rainfall (75.88%), (2005), the selection of landslide inﬂuencing factors should be

Fig. 2. Location of the study case, Fengjie County, China.

4 D. Sun et al. / Geomorphology 362 (2020) 107201

Table 1 of landslides. Besides, it also corresponds to a main cause, i.e., the rain-
Data and data sources. fall. Distance from roads and distance from buildings are both the fac-
Data name Data sources Type Scale tors selected for the trigger of human activities. The construction of
Historical Chongqing Geological monitoring Datasheet
roads and buildings will significantly lower the stability of the slope, in-
landslides station crease the micro topography generated in the process of slope excava-
DEM Aster satellite Grid 30 m tion and accelerate the occurrence of landslides.
Geological data National Geological Data Center Grid 1:200,000
Land cover Chongqing Municipal Bureau of land Vector 1:100,000
and resources 3.2. Data processing
Administrative Chongqing Municipal Bureau of land Vector 1:100,000
division and resources
The data of slope, aspect, slope position, landforms (Weiss, 2001),
River network Chongqing Water Resources Bureau Vector 1:100,000
Satellite image Geospatial Data Cloud platform Grid 30 m profile curvature, TWI (Yu et al., 2017), CRDS (Wen et al., 2017) were
Annual rainfall Chongqing Meteorological Datasheet 30 m obtained by ArcGIS processing of DEM. The data of lithology, fault and
Administration stratum occurrence were obtained by vectorization of 1:200,000 geo-
Road Chongqing Transportation Vector 1:100,000 logical map. NDVI was generated by Landsat 8 OLI data. The annual av-
Commission
erage rainfall was formed by the gird data, which were produced from
raw data by the spatial interpolation method. The raw data is the com-
plete product data of kilometer grid precision from January 2008 to De-
measurable, operable, uneven, complete and non-redundant. Therefore,
cember 2014 by Chongqing Meteorological Bureau. Distance from
we have considered the types and triggers of landslides, and increased
faults, distance from rivers, distance from roads, and distance from
the number of conditioning factors to 16. In this paper, 16 factors of
buildings were obtained respectively by multi-level buffering of faults,
landslide susceptibility obtained from four aspects are topography (ele-
rivers, roads, and buildings. The faults are all faults, but there are no ac-
vation, slope, aspect, slope position, landforms, profile curvature, topo-
tive faults in this research area. We summarized the categories of condi-
graphic wetness index (TWI) (Yu et al., 2017), geological conditions
tioning factors in Table 2.
(lithology, distance from faults, combination reclassification of stratum
A geospatial database of landslide conditioning factors was
dip direction and slope aspect (CRDS) (Xie et al., 2018), environmental
established with a grid unit with 30 m resolution as the basic unit for
conditions (Normalized Difference Vegetation Index (NDVI), distance
landslide susceptibility assessment (Fig. 4).
from rivers, annual average rainfall, land cover), and human activities
(distance from roads, distance from buildings). The specific factors con-
sidered were topographic wetness index (TWI), combination reclassifi- 4. RF model based on hyperparameter optimization using Bayes
cation of stratum dip direction and slope aspect (CRDS), distance from algorithm
rivers, annual average rainfall, distance from roads and distance from
buildings. Among them, TWI represents the composite topographical 4.1. Random forest model
index to evaluate the spatial distribution of soil water, which can de-
scribe the influence of terrain on the degree of soil water saturation. Random forest is an ensemble learning method, first proposed by
The content and distribution of water in the soil will affect the condition Breiman (1996) and Cutler, 2005, constructing multiple decision trees
of rock, soil and vegetation on the surface of the slope, thus affecting the through different data subsets, and voting on the results of multiple de-
landslide. CRDS refers to the combination relationship between rock cision trees to get the output of the random forest. A large existing body
stratum tendency and slope direction, which acts as a comprehensive of research has shown that random forest is considerably tolerant for
factor considering both terrain and geology (Wen et al., 2017). Distance outliers and noise, unlikely to over-fit, and of high prediction accuracy
from rivers was selected because the river has the effects of downward and stability (Li, 2013).
cutting, lateral cutting and wave impact on the slope bank, which will The core of random forest is to construct a large number of unrelated
take away the rock and soil mass at the slope toe and create aerial sur- decision tree models [h (X,θk); k = 1,…] for training. Each decision tree
face at the slope toe, thus to prepare the conditions for the occurrence of makes a prediction about the classification of the sample separately (for
landslide. One of the main triggers for landslide studied in this paper is classification algorithm). The final output is the mode of the sample
ground water (pore water), which indicates that rivers and other waters classification. The performance of the random forest can be improved
have played a great role in the landslide within the study area. Annual by constructing unrelated training sets in order to decrease the variance
average rainfall refers to the average rainfall under a long-term state, of model. Different training sets of classifications h1(X)…hk(X) are ob-
which affects not only the slope itself but also the development of veg- tained by sample training, and then are combined to construct the ran-
etation, surface runoff and other factors, thus affecting the development dom forest model. The output of random forest is determined by a

˄a˅landslide type ˄b˅trigger

Fig. 3. Scale of landslide type and trigger.

D. Sun et al. / Geomorphology 362 (2020) 107201 5

Table 2 to select the optimal hyperparameter values according to the evaluation

Conditioning factors categories of landslide. index. In the research, Bayesian optimization algorithm (Garrido-
Conditioning Classes Classification standard Merchán and Hernández-Lobato, 2019) is used to determine the opti-
factor mal hyperparameter values, which has the advantage that the optimal
Elevation/(m) 7 1. b340; 2. 340–595; 3. 595–850; 4. 850–1105; 5. value can be obtained in a short time. Bayesian algorithm, in the Gauss-
1105–1360; 6. 1360–1615; 7. N1615 ian process (GP), can make full use of prior knowledge and has stronger
Slope/(°) 6 1. b10°; 2. 10°–20°; 3. 20°–30°; 4. 30°–40°; 5. robustness. The algorithm only needs input and output data to fit the
40°–50°; 6. N50°
posterior distribution of the objective function by increasing the num-
Aspect/(°) 9 1. Flat; 2. North; 3. Northeast; 4. East; 5. Southeast;
6. South;7. Southwest; 8. West; 9. Northwest ber of samples, so as to achieve the hyperparameter optimization of
Slope position 6 1. Valleys; 2. Lower slope; 3. Flats slope; 4. Middle the model and get the optimal solution.
slope; 5. Upper slope; 6. Ridge The Bayes optimization (BO) methodology relies on fitting a proba-
Landforms 10 1. Canyons, Deeply incised streams; 2. Midslope bilistic model to observations of the black-box objective that is being
drainages, shallow valleys; 3. Upland drainages,
Headwaters; 4. U-shape valleys; 5. Plains; 6. Open
optimized. The predictive distribution of that model specifies the poten-
slopes; 7. Upper slopes, Mesas; 8. Local ridges hills in tial values of the objective at each point of the input space. By taking into
valleys; 9. Midslope ridges, Small hills in plains; 10. account this predictive distribution, BO methods guide the search focus-
Mountain tops, High narrow ridges ing on those regions of the input space that are expected to deliver the
Profile curvature 7 1. −1.0; 2. −1–0.5; 3. −0.5–0; 4. 0–0.5; 5. 0.5–1.0; 6.
most information about the solution of the optimization problem. Typ-
1.0–1.5; 7. N1.5
TWI 7 1. b10; 2. 10–12; 3. 12–14; 4. 14–16; 5. 16–18; 6. ically, the probabilistic model used for BO is a Gaussian Process (GP)
18–20; 7. N20 (Garrido-Merchán and Hernández-Lobato, 2019). The reason for this
Lithology 7 1. TJx; 2. T1j; 3. D; 4. T1d-j; 5. J2s，J1z-2x，J3sn，J3p; 6. is the ability of GPs to easily compute a predictive distribution of the
T1d，T3xj，T2b; 7. P，P3 objective.
Distance from 11 1. b100; 2. 100–200; 3. 200–300; 4. 300–400; 5.
A GP is defined as a prior distribution over functions. When using a
faults/(m) 400–500; 6. 500–600; 7. 600–700; 8. 700–800; 9.
800–900; 10. 900–1000; 11. N1000 GP as the underlying model, the assumption made is that the black-
CRDS 6 1. Bedding slope; 2. Skewed slope; 3. Inclined slope; box objective function f(·) that is being optimized has been randomly
4. Horizontal; 5. Reverse slope; 6. Flat sampled from the GP, i.e., f(·)–GP(0, k(·,·).This distribution is fully
NDVI 7 1. b0.10; 2. 0.10–0.20; 3. 0.20–0.30; 4. 0.30–0.40; 5.
specified in terms of a covariance function k(x,x′) and a zero mean.
0.40–0.50; 6. 0.50–0.60; 7. N0.60
Distance from 7 1. b100; 2. 100–200; 3. 200–300; 4. 300–400; 5.
The intrinsic features of the objective, f(·), such as smoothness, level
rivers/(m) 400–500; 6. 500–600; 7. N600 of additive noise, amplitude, etc., are specified by the covariance func-
Land cover 6 1. Cultivated land; 2. Woodland; 3. Meadow; 4. Land tion k(x,x′). The output of this function is simply the covariance be-
used for building; 5. Water area; 6. Unused land tween f(x) and f(x′). In the general Gaussian model, the probability of
Distance from 7 1. b100; 2. 100–200; 3. 200–300; 4. 300–400; 5.
each feature needs to be calculated and then accumulated. In the multi-
roads/(m) 400–500; 6. 500–600; 7. N600
Distance from 7 1. b100; 2. 100–200; 3. 200–300; 4. 300–400; 5. variate Gaussian probability model in the GP, the covariance matrix
buildings/(m) 400–500; 6. 500–600; 7. N600 needs to be constructed, and the probability values of all feature
Annual average 5 1. b990; 2. 990–1040; 3. 1040–1100; 4. 1100–1160; vectors are used.
rainfall /(mm) 5. N1160
The final multivariate Gaussian probability model is:

1 1
P ðxÞ ¼ exp − ðx−uÞT covðx−uÞ−1 ð3Þ
voting process shows as Eq. (1). n 1 2
ð2π Þ2 j covj2
k
H ðxÞ ¼ arg max
z Σi¼1 I ðhi ðxÞ ¼ Z Þ ð1Þ
Among the functions, the mean (μ) and covariance (cov) are given
H(x) denotes the Random Forest model. hi is a single decision tree by:
model, Z is the output variable, and I(.) is the indicative function.
The construction of random forest mainly includes the following 1 n
μ¼ ∑ xi ð4Þ
steps (Fig. 5): n i¼1
(1) The training set is generated for each decision tree sampling.
Using the bagging sampling technique, N training subsets are set up in 1 n
cov ¼ ∑ ðx −μ Þðxi −μ ÞT ð5Þ
playback, in which the number of training subsets is less than the total n i¼1 i
training samples, and generally about one-third of the total training
samples. In this study, the Bayes optimization algorithm was used, and
(2) N decision trees are generated and random forests are con- the accuracy of cross validation was used as the objective function
structed. Based on the training subset established in the first step, a de- for optimization. The 7 main hyperparameters involved in RF
cision tree is established for each subset. In the process of building a model are shown in Table 3, in which criterion took Gini sample
decision tree, the CART algorithm is used to split nodes. CART uses the segmentation criteria. Because the number of samples in the
principle of Gini coefficient minimization to assign randomly selected study was 16,720, the default value of 1 was taken directly. It was
objects to class I at node t according to probability p(i|t). The estimated inevitable to put back duplicate samples in sample selection, and
probability that the objects actually belong to class j is p(j|t). Under this the bootstrap value was true. In this study, we optimize the
rule, the estimated probability of misclassification is shown as Eq. (2): remaining n_estimators, max_depths, min_samples_splits,
max_features, and output the hyperparameter values obtained in
Gini ¼ i≠jJpðijtÞpð jjtÞ ð2Þ each iteration.
In this study, 1520 historical landslide data in the study area were se-
lected as positive samples. The 500 m buffer area around all the land-
4.2. Model hyperparameter optimization slide points and the area where the rivers in the study area were
removed, and the remaining area was regarded as non-landslide
Hyperparameter optimization has a great influence on the accuracy areas. The 15,200 non-landslide points were randomly selected as neg-
of machine learning algorithm model. Hyperparameter optimization is ative samples according to the ratio of 1:10 (Das et al., 2012), thus
6 D. Sun et al. / Geomorphology 362 (2020) 107201

˄a˅Elevation ˄b˅Slope ˄ c˅Aspect ˄d˅Slope position

˄e˅ ˄f˅Profile curvature ˄g˅TWI ˄h˅Lithology

˄i˅ Distance from faults ˄j˅CRDS ˄k˅NDVI ˄l˅Distance from rivers

(o) Distance from buildings ˄p˅Annual average rainfall

Fig. 4. Conditioning factors of landslide susceptibility.

D. Sun et al. / Geomorphology 362 (2020) 107201 7

Fig. 5. The process of random forest.

forming the sample data set. Generally, the receiver operating charac- say, the hyperparameters of the optimized model were used in later
teristic (ROC) can be used to test the evaluation results of typical two model training.
classiﬁcation problems such as landslides. The area under the ROC
curve is AUC value (area under curve, AUC), which can quantitatively 4.3. Model training and accuracy test method
represent the accuracy of model prediction (Li et al., 2014). The ROC
curve (Fig. 6) of the model constructed by different parameters was Using the above optimized hyperparameters, the random forest
drawn by the aforementioned Bayesian optimization algorithm, and model can be trained and constructed. In order to reduce the inﬂuence
the AUC value was calculated. For AUC, the value of 1 represents the of a single sampling method on model results, the 5-fold cross-
ideal model and the value of 0.5 represents the model without discrim- validation method was used to select training data and test data. 5-
ination effect. In addition, the higher value represents the better model.
In the process of hyperparameter optimization iteration, it was found
that the AUC value of the model obtained with different parameters
was between 0.81 and 0.91. We choose the hyperparameters corre-
sponding to the highest value (0.91), [‘n_estimators’:50,
‘max_depths’:16,′ min_samples_splits′:4, ‘max_features’:10]. That is to

Table 3
Main hyperparameters involved in RF.

Hyperparameter Explanation

n_estimators The number of decision trees.

criterion Sample segmentation criteria, including Gini and entropy.
min_samples_split The minimum number of samples to be split.
max_depths The maximum depth of the tree, by default, until the samples
in all leaves are pure samples or the number of samples is less
than min_samples_split.
max_features The maximum number of features, the number of features
used for segmentation, by default is the square root of the
number of features.
min_samples_leaf Leaf nodes have the least number of samples.
bootstrap Whether there is a put back duplicate sampling, the value is
true or false.
Fig. 6. ROC curves of RF with different hyperparameter values.
8 D. Sun et al. / Geomorphology 362 (2020) 107201

fold cross-validation method divided the whole dataset (1520 positive

samples and 15,200 negative samples) into five disjoint subsets ran-
domly and averagely. One subset was tested each time, and the rest sub-
sets were used for model training (Li et al., 2014).
The confusion matrix is also used to analyze the prediction accuracy of
the model (Guo et al., 2019; Sun et al., 2019). For landslides, the typical
two classification problem, the threshold value of the analysis results is
0.5. Landslides occur when the predicted value is N0.5, and no landslides
occurs when the predicted value is b0.5.On this basis, the discrimination
accuracy is obtained, and the cross validation results are analyzed with
ROC curve. Thus, the discrimination accuracy and AUC value of the confu-
sion matrix for each verification can be statistically obtained (Table 4).
It can be seen that in the 5-fold cross-validation, the average accuracy
of the training dataset was 0.929, and the average AUC value was 0.948 ac-
cordingly; the average accuracy of the test dataset was 0.918 and the aver-
age AUC value was 0.853 accordingly. For AUC, the value of 1 represents
the ideal model and the value of 0.5 represents the model without discrim-
ination effect. In addition, the higher value represents a better model, so
the test results were satisfactory. Besides, the AUC value changes in a
Fig. 7. The ROC curve of the training dataset, the validation dataset and the simulated
small range and the distribution was relatively concentrated, which indi- results.
rectly proved that the model was stable and not prone to be affected by
the classification of dataset sets. The AUC values of sample 4 training
dataset, test dataset, and all dataset verification results were 0.95, 0.87, was simulated. According to the expert experience method, it was di-
and 0.93 (Fig. 7), respectively, with the highest accuracy. The model vided into five grades (very low susceptibility region with P b .06 vs.
established by this sample has good stability and reliability, which can be low susceptibility region with 0.06 ≤P b .14 vs. moderate susceptibility
used for the analysis of landslide susceptibility in the research area. region with 0.14 ≤P b .25 vs. high susceptibility region with 0.25 ≤P b
.4 vs. very high susceptibility region with P ≥0.4). The landslide suscep-
4.4. Factor importance analysis tibility map of Fengjie County has been obtained (Fig. 9). It can be found
that most areas of Fengjie County were located in low susceptibility re-
Since numerous conditioning factors impact on landslide suscepti- gions of landslides, and mainly located in the south and southeast; very
bility in different ways, it is necessary to study the importance and the high susceptibility regions were mainly located on both sides of the
mechanism of the conditioning factors in order to provide guidance Yangtze River and its tributaries, mainly located in the north and central
for landslide disaster prediction and prevention. In this study, the im- Fengjie.
portance of 16 conditioning factors was evaluated by using the Mean Table 5 is a statistical table of the number of grid units and historical
Decrease Accuracy (Du et al., 2017) of random forest model, i.e., by scat- landslides in each susceptibility region. It can be seen that the propor-
tering the value of an conditioning factor, the reduction degree of pre- tion of historical landslides increased gradually with the increase of
diction accuracy of the model before and after scattering is analyzed. the susceptibility grade, and the density of landslides was positively cor-
The larger the discrepancy is, the more importance the factor is. The related with the vulnerability grade. The area of very low and low sus-
order of conditioning factor importance is shown in Fig. 8. ceptibility region accounted for 67.72% of the total area of the study
Elevation was the most important conditioning factor affecting the oc- area, while the number of historical landslides only accounted for
currence of landslides in the study area, with an average accuracy reduc- 17.60% of the total landslides. The area of very high and high susceptibil-
tion of 17.77; average annual rainfall was the second, with an average ity region accounted for 18.17% of the total area, while the number of
accuracy reduction of 11.93, and the two most insignificant conditioning historical landslides accounted for 65.2% of the total landslides.
factors were slope position (1.98) and distance from fault (1.23).
5.2. Distribution characteristics of new landslide events
5. The application of random forest landslide susceptibility mapping
method In order to verify the results of susceptibility zoning, the data of land-
slides in the study area in 2017 were collected. There were 85 landslides
5.1. Susceptibility mapping and the distribution characteristics of historical in Fengjie County in 2017, and among them 61 landslides, 71.8% of the
landslides total, were new landslide events. The locations of 25 landslides in
2017 were found with geographic coordinates information. Projecting
The model established by sample 4 was applied to the geospatial da- the location coordinates to the landslide susceptibility map, it was
tabase of the study area, and the landslide probability P of the study area shown as expected that most of the new landslide events located in ei-
ther high susceptibility region or very high susceptibility region.
Table 4 Three typical landslides in 2017 shown in Fig. 10 were analyzed for
The accuracy of 5-fold cross-validation. case study: (1) Xinpu landslide (Fig. 10a) is located in Anping town on
Order Misjudgment Accuracy AUC Value
the right bank of the Yangtze River in Fengjie County, which is a land-
number slide group composed of multiple landslides. According to the descend-
ing alphabet order, the names of the landslide group members are
Training Test Training Test Training Test
dataset dataset dataset dataset dataset dataset Daping landslide, Shangertai landslide and Xiaertai landslide. From
June 1 to July 31, 2017, Xinpu landslide partially deformed due to rain-
1 953 245 0.930 0.926 0.952 0.851
2 906 300 0.932 0.910 0.949 0.842 fall, and the largest landslide deformation occurred on June 29;
3 917 289 0.931 0.914 0.951 0.858 (2) Zhakuoshi landslide (Fig. 10b) is located in Group 2, longpo village,
4 954 278 0.929 0.917 0.951 0.869 Kangle Town, Fengjie County, and the sloping section on the right bank
5 1128 253 0.923 0.924 0.939 0.844 of Meixi River, the first tributary of the Yangtze River. From September 1
Mean 972 273 0.929 0.918 0.948 0.853
to October 31, the rainfall induced continuous deformation activity of
D. Sun et al. / Geomorphology 362 (2020) 107201 9

Fig. 8. The sorting of conditioning factors.

Zhakuoshi landslide. There were many tension cracks in the front part of Rainfall scoured the surface of the slope, and the unstable rock and
the landslide body, and the largest deformation occurred on October 3. soil particles on the surface of the slope were taken away by the surface
(3) Huoshitan landslide is located in the slope zone on the left bank of runoff formed by rainfall, which led to the erosion of the slope. The high
Meixi River, the first tributary of the Yangtze River, Qiaowan village, rainfall areas are stripped or soil/regolith. The annual average rainfall
Xincheng Town, Fengjie County. From October 1 to October 31, a would also affect the development of vegetation, thus affecting the de-
large-scale soil landslide (Fig. 10c) occurred in Huoshitan. Rainfall in- velopment of the landslide.
duced continuous deformation, multiple tension cracks appeared in The least important conditioning factor was the distance from the
the front of the landslide body, and the largest landslide deformation oc- fault. Earthquake-triggered landslides occurred mostly in the vicinity
curred on October 7. of the more concentrated active fault, which featured dense distribution
Through comparative analysis, both Xinpu landslide and Huoshitan along the direction of the structural line (Wen et al., 2016). However,
landslide are in high and very high susceptibility regions, and Zuokushi the susceptibility of disaster varies with different distance from fault.
landslide is in very high susceptibility region. In conclusion, most of the The influence of distance from the fault is likely to be limited within a
new landslide events are located in either high susceptibility region or certain range (Ni et al., 2018). However, from Fig. 4(i), it can be found
very high susceptibility region. The landslide susceptibility mapping that there were only two fault zones in the landslide intensive area,
model has strong prediction ability. which might be caused by the insufficient accuracy of fault data in the
study area and the large classification distance from the fault distance.
6. Discussion and conclusion The landslide site did not show a direct relationship with the distance
from the fault, which might be that the distance exceeded the range of
6.1. The importance and impact law of typical conditioning factors influence.
In the past, the influence of human activities was seldom considered
It can be seen from Fig. 8 that elevation is the most important condi- in susceptibility models. The influence of human engineering activities
tioning factor affecting the occurrence of landslide in the study area. The is a conditioning factor that cannot be ignored in the formation of land-
impacts of elevation on landslide hazard can be explained by its close slides. All slopes excavated and backfilled manually have different de-
correlation with vegetation type, vegetation coverage, soil moisture, grees of deformation and damage. The distance from the house
human engineering activities and rainfall. According to the distribution identified in this study was a conditioning factor that has not been inves-
of historical landslides in each elevation classification on the special el- tigated before, and its importance was as high as 7.24% in this study area,
evation layer, the statistical diagram of landslide density within the ele- which contributed significantly to the occurrence of landslide.
vation range of the study area was generated (Fig. 11a). It can be seen Concluding on the above discussion, in the importance evaluation of
from the figure that, in general, the landslide density had a negative cor- conditioning factors given by the random forest method, the 16 condi-
relation with the elevation, and the landslide density was higher in the tioning factors are not independent and each may have a certain corre-
place with lower elevation. Fengjie County is a typical mountainous area lation with other conditioning factors. Some conditioning factors may
with complex terrain. Low elevation areas are often of loose soil bed, strengthen or weaken the importance of other related conditioning fac-
more human engineering activities and more frequent landslide, while tors. Therefore, the importance analysis of conditioning factors is a com-
high elevation areas are of tighter soil bed, less human engineering ac- prehensive and integrated reflection of all conditioning factors' mutual
tivities, higher vegetation coverage and less frequent landslides. There- restriction and balance. In order to reduce the correlation between con-
fore, the most intensive elevation range of landslide distribution is the ditioning factors, it is necessary to screen out the dominant conditioning
low elevation range with frequent human activities and low vegetation factors to change the whole conditioning factor system.
coverage.
Annual average rainfall is the second most important conditioning 6.2. Factor selection
factor. The statistical chart of annual average rainfall and landslide den-
sity (Fig. 11b) generated from the data of multi-year average rainfall In this paper, the dominant conditioning factors were selected from
and landslide density can be analyzed. The landslide density increased all the conditioning factors by recursive feature elimination. The pur-
at first and then decreased as annual average rainfall continues to rise. pose of the conditioning factors screening is to remove conditioning
10 D. Sun et al. / Geomorphology 362 (2020) 107201

Fig. 9. Landslide susceptibility map of Fengjie County.

factors that are not relevant or redundant. In addition, sufficient domi- conditioning factors of landslide were finally selected, including eleva-
nant conditioning factors can save the running time and input data re- tion, distance from the buildings, land cover, lithology, annual average
sources of the model. It could be a reference for other similar studies. rainfall, distance from road, distance from river, NDVI, and slope. The
By using recursive feature elimination method (Zhou et al., 2014), the final model selection of the obfuscation Matrix test results is shown in
last feature of the importance ranking was eliminated each time, and Table 6 and the order of importance of the conditioning factors is
the accuracy of the model was calculated. The order of the importance shown in Fig. 12.
of the conditioning factors obtained after each recursion was compared, Among the final 9 dominant conditioning factors, elevation
and the accuracy was kept at about 99%. Hence, 9 dominant and annual average rainfall were still the most important two

Table 5
Statistic result of landslide susceptibility in different grades.

Susceptibility level Grid number Area proportion Landslide Landslide proportion Density proportion

(%) (%)

Very low susceptibility region 2,065,678 45.94 92 6.05 0.05

Low susceptibility region 979,313 21.78 175 11.51 0.20
Middle susceptibility region 632,168 14.06 267 17.57 0.46
High susceptibility region 541,381 12.04 444 29.21 0.90
Very high susceptibility region 277,458 6.17 542 35.66 2.14
Statistics 4,495,998 100 1520 100
D. Sun et al. / Geomorphology 362 (2020) 107201 11

Fig. 10. Examples of new landslides Map 2017.

factors, with an average accuracy reduction of 51.98 and 52.03, conditioning factors, we found that in a large number of studies (Chen
and the contribution rate of lithology and slope factor was rela- et al., 2017a; Chen et al., 2017b; Chen et al., 2018; Hong et al., 2016;
tively high. The distance from river and road was behind, indicat- Youssef et al., 2015), both elevation and lithology have been included
ing that the contribution rate to the occurrence of landslide was in the analysis of conditioning factors; however, annual average rainfall
relatively low. was taken out from the articles of Chen et al. (2017b) and Chen et al.
As per the result of ranking, elevation, annual average rainfall and li- (2018). The reason why the annual average rainfall was not considered
thology were the dominant conditioning factors. For these three might be that the data was too complex to obtain, or it might be
12 D. Sun et al. / Geomorphology 362 (2020) 107201

δaεElevation

δbεAnnual average rainfall

Fig. 11. Partial effects on landslide susceptibility of typical conditioning factors.

replaced by some other conditioning factors (such as TWI, a soil mois- dominant conditioning factors to evaluate landslides and the annual av-
ture index indicating the water content and distribution in the soil, erage rainfall is consistent with 82% of the historical landslides in the
and rainfall, one of the sources of soil water). More likely, the landslide study area.
in the study area was basically a large/deep/bedrock one, and the in-
ducement tended to be earthquakes and other earth movements. 6.3. The advance of model optimization
Therefore, it shows that elevation and lithology are the essential
There have been much research on random forest; however, most of
the research done in the past mainly focused on the comparison be-
Table 6 tween random forest and other landslide evaluation models (including
Confusion matrix of random forest classiﬁcation results.
traditional statistical methods and mainstream machine learning algo-
Actual value Accuracy rithms). For example, Chen et al. (2017a) compared three advanced ma-
Non-landslide Landslide (1) chine learning algorithms, i.e., LMT, CART and RF for the evaluation
(0) accuracy of landslide susceptibility in Long County in China. Chen
Predicted Non-landslide 15,200 13 Precision:0.9991 et al. (2018), basing Longhai, China as the study area, compared the ac-
value (0) curacy of the Best-First Decision Tree, Random Forest and Naive Bayes
Landslide 0 1516 Precision:1 Tree. The comparison results of the two studies showed that the RF
(1) model had the best accuracy. Most of the research results show that ran-
Recall:1 Recall:0.9914 Accuracy:0.9992
dom forest model is a promising mapping technique for landslide
D. Sun et al. / Geomorphology 362 (2020) 107201 13

Fig. 12. Importance ranking of conditioning factors of landslides after screening.

sensitivity. However, Youssef et al. (2015) compared the accuracy of factors that shall not be ignored, while slope position and distance
four models, namely Random Forest (RF), Boosted Regression Tree from fault were relatively insignificant. Hence, it can be further opti-
(BRT), Classification And Regression Tree (CART) and Generalized Lin- mized to obtain the dominant conditioning factors of landslide suscep-
ear Method (GLM) and Hong et al. (2016) compared RF with traditional tibility to ensure the accuracy of efficient modeling and analysis.
statistical models (EBF, LR, FR), the results of both studies show that the 4) Through Bayesian hyperparameter optimization, and 5-fold
RF accuracy was generally higher than that of the other models. cross-validation for the selection of the best sample and dominant con-
It can be found that no conclusion has been drawn on the merits and ditioning factor screening analysis, we can build a more efficient ran-
demerits of RF model and other models. No model optimization but dom forest landslide susceptibility evaluation model with high
built-in parameters of the model was applied in the studies above. At accuracy and less dominant conditioning factors. Among them, as the
that time, the model was not necessarily to be the optimal model, and highlight of this paper, Bayesian hyperparameter optimization is used
therefore its accuracy could be further improved. The comparison be- to find the hyperparameters within a certain range through iterative
tween the non-optimized model and other models was not much con- processing in the probability model. Corresponding ROC curve for
vincing, which did not reflect the advantages and disadvantages of each optimized parameter is obtained through random forest and the
each model to the specific study area in a real sense. In order to improve parameter with the highest AUC value is selected as the optimal
the accuracy of the model, we can consider optimizing the parameters. hyperparameter.
The Bayes Optimization (BO) relies on fitting the probability model to
the observations of the black box target being optimized. Through Declaration of competing interest
iterative processing in the probability model, we can find the
hyperparameters in a certain range. The hyperparameters with the The authors declare that they have no known competing financial
highest AUC value of 0.91 was chosen to ensure the accuracy of the op- interests or personal relationships that could have appeared to influ-
timal model. ence the work reported in this paper.

6.4. Conclusion Acknowledgements

1) A random forest evaluation model of landslide susceptibility after We would like to express our gratitude to Chongqing Meteorological
Bayesian hyperparameter optimization was proposed in this study. A Administration for providing essential meteorological data, and Chong-
typical mountainous area with multiple landslides was taken as an ex- qing Institute of Geology and Mineral Resources for providing valuable re-
ample for application analysis, and the importance degree and influence search materials of historical slope disaster cases and new slope
rule of conditioning factors were analyzed. deformation/damage cases in the research area. We also thank our fam-
2) The result of the random forest model after the hyperparameter ilies and friends who helped us during the writing of this paper.
optimization, which was applied to the case research area, indicated The current research is supported by grants: National Key R&D Pro-
that the AUC values of ROC curve in training data set, verification data gram of China (Grant No. 2018YFC1505501), and the National Natural
set and regional simulation were 0.95, 0.87 and 0.93 respectively. 65% Science Foundation of China (Grant No. 41807498).
of historical landslides fell in high susceptibility region with an area of
b20%, and the model had high reliability and stability. In 2017, most of References
the new typical landslides in the study area were located in high suscep-
Ayalew, L., Yamagishi, H., 2005. The application of GIS-based logistic regression for land-
tibility region, and the model had high prediction ability.
slide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geo-
3) The results of conditioning factor importance evaluation and im- morphology 65 (1–2), 15–31.
pact law analysis of typical conditioning factors showed that elevation, Breiman, L., 1996. Bagging predictors. Mach. Learn. 24 (2), 123–140.
annual average rainfall and other conditioning factors were the most Cao, Z., 2014. Study on Optimization of Random Forests Algorithm. Capital Economic and
Trade University, Beijing Doctor dissertation (in Chinese).
important conditioning factors of landslide susceptibility. Human engi- Chen, X., Liu, M., 2005. Prediction of protein-protein interactions using random decision
neering, such as buildings and roads, were significant conditioning forest framework. Bioinformatics 21 (24), 4394–4400.
14 D. Sun et al. / Geomorphology 362 (2020) 107201

Chen, W., Xie, X., Wang, J., Pradhan, B., Hong, H., Bui, D.T., Duan, Z., Ma, J., 2017a. A com- Sun, D., Wu, J., Wen, H., Xue, M., 2019. Damage resistance mapping of mountain slopes
parative study of logistic model tree, random forest, and classification and regression based on geospatial big data mining. Journal of Chongqing Normal University (Natu-
tree models for spatial prediction of landslide susceptibility. Catena 151, 147–160. ral Science) 36 (3), 64–71.
Chen, W., Pourghasemi, H.R., Panahi, M., Kornejady, A., Wang, J., Xie, X., Cao, S., 2017b. Taalab, K., Cheng, T., Zhang, Y., 2018. Mapping landslide susceptibility and types using
Spatial prediction of landslide susceptibility using an adaptive neuro-fuzzy inference Random Forest. Big Earth Data 1–20.
system combined with frequency ratio, generalized additive model, and support vec- UNOCHA, 2019. Asia and the Pacific: Weekly Regional Humanitarian Snapshot (6–13 Au-
tor machine techniques. Geomorphology 297, 59–85. gust 2019). Available online. http://reliefweb.int/report/china/asia-and-pacific-
Chen, W., Zhang, S., Li, R., Shahabi, H., 2018. Performance evaluation of the GIS-based data weekly-regional-humanitarian-snapshot-6-13-august-2019.
mining techniques of best-first decision tree, random forest, and naïve Bayes tree for Wan, Z., Dong, H., Liu, B., 2010. On choice of hyper-parameters of support vector ma-
landslide susceptibility modeling. Sci. Total Environ. 644, 1006–1018. chines for time series regression and prediction with orthogonal design. Rock Soil
Cutler, A., 2005. Random forests. American Cancer Society. Mech. 31 (2), 503–508+515.
Das, I., Stein, A., Kerle, N., Dadhwal, V.K., 2012. Landslide susceptibility mapping along Wang, X., Huang, F., Cheng, Y., 2014. Super-parameter selection for Gaussian-Kernel SVM
road corridors in the Indian Himalayas using Bayesian logistic regression models. based on outlier-resisting. Measurement 58, 147–153.
Geomorphology 179 (60), 116–125. Ward, M.M., Pajevic, S., Dreyfuss, J., Malley, J.D., 2006. Short-term prediction of mortality
Du, G., Zhang, Y., Iqbal, J., Yang, Z., Yao, X., 2017. Landslide susceptibility mapping using an in patients with systemic lupus erythematosus: classify cation of outcomes using ran-
integrated model of information value method and logistic regression in the dom forests. Arthritis Rheum. 55 (1), 74–80.
Bailongjiang watershed, Gansu Province, China. J. Mt. Sci. 14 (2), 249–268. Weiss, A., 2001. Topographic position and landforms analysis. Proceedings of ESRI User
Froude, M.J., Petley, D.N., 2018. Global fatal landslide occurrence from 2004 to 2016. Nat. Conference. San Diego, CA, USA, pp. 9–13.
Hazards Earth Syst. Sci. 18, 2161–2181. Wen, H., Xie, P., Xiao, P., Hu, D., 2016. Rapid susceptibility mapping of earthquake-trig-
Garrido-Merchán, E.C., Hernández-Lobato, D., 2019. Dealing with categorical and integer- gered slope geohazards in Lushan County by combining remote sensing and the
valued variables in Bayesian Optimization with Gaussian processes. Neurocomputing AHP model developed for the Wenchuan earthquake. Bull. Eng. Geol. Environ. 76
380, 20–35. (3), 909–921.
Guo, Z., Yin, K., Huang, F., Fu, S., Zhang, W., 2019. Evaluation of landslide susceptibility Wen, H., Wang, G., Huang, X., Xue, J., Xie, P., Zhang, Y., 2017. A Preliminary Evaluation
based on landslide classification and weighted frequency ratio model. Chin. J. Rock Method of Slope Stability Based on Topographic Map and Geological Map Chinese
Mech. Eng. 38 (2), 287–300. patent No. 2017105719823. (In Chinese).
Hong, H., Pourghasemi, H.R., Pourtaghi, Z.S., 2016. Landslide susceptibility assessment in Xie, Y., Li, X., Ngai, E.W.T., Ying, W., 2009. Customer churn prediction using improved bal-
Lianhua County (China): a comparison between a random forest data mining tech- anced random forests. Expert Syst. Appl. 36 (3), 5445–5449.
nique and bivariate and multivariate statistical models. Geomorphology 259, Xie, P., Wen, H., Ma, C., Baise, L.G., Zhang, J., 2018. Application and comparison of Logistic
105–118. regression model and Neural network model in earthquake-induced landslides sus-
Kang, L., Chen, R., Xiong, N., Chen, Y., Hu, Y., Chen, C., 2019. Selecting hyper-parameters of ceptibility mapping at mountainous region, China. Geomat. Nat. Haz. Risk 9 (1),
Gaussian process regression based on non-inertial particle swarm optimization in in- 501–523.
ternet of things. IEEE Access 7, 59504–59513. Yin, K., Zhu, L., 2001. Landslide hazard zonation and application of GIS. Earth Sci. Front. 8
Kim, S., Lee, J., Ko, B., Nam, J., 2010. X-ray image classification using random forests with (2), 279–284.
local binary patterns. Proceedings of the 9th International Conference on Machine Ying, W., Li, X., Xie, Y., Johnson, E., 2008. Preventing customer churn by using random for-
Learning and Cybernetics. IEEE Computer Society, Qingdao, China, pp. 3190–3194. ests modeling. Proceedings of the 7th IEEE international Conference on Information
Li, Z., 2013. Several Research on Random Forest Improvement. Xiamen University, Xia- Reuse and Integration. IEEE Computer Society, Las Vegas, USA, pp. 429–434.
men Master dissertation (in Chinese). Youssef, A.M., Pourghasemi, H.R., Pourtaghi, Z.S., Al-Katheeri, M.M., 2015. Landslide sus-
Li, T., Tian, Y., Wu, L., Liu, L., 2014. Landslide susceptibility mapping using random forest. ceptibility mapping using random forest, boosted regression tree, classification and
Geography and Geo-Information Science 30 (06), 25–30. regression tree, and general linear models and comparison of their performance at
Liu, X., Zhu, A., Yang, L., Pei, T., Liu, J., Zeng, C., Wang, D., 2020. A graded proportion Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 13 (5), 839–856.
method of training sample selection for updating conventional soil maps. Geoderma Yu, K., Yao, X., Qiu, Q., Liu, J., 2016. Landslide spatial prediction based on random forest
357, 113939. model. Transactions of the Chinese Society for Agricultural Machinery 47 (10),
Ni, S., Ma, C., Yang, H., Zhang, Y., 2018. Spatial distribution and susceptibility analysis of 338–345.
avalanche, landslide and debris flow in Beijing mountain region. Journal of Beijing Yu, H., Luo, L., Ma, H., Li, H., 2017. Application appraisal in catchment hydrological analysis
Forestry University 40 (06), 81–91. based on SRTM 1 Arc-Second DEM. Remote Sens. Land Resour. 29 (2), 138–143.
Pang, H., Datta, D., Zhao, H., 2010. Pathway analysis using random forests with bivariate Zhou, Q., Zhou, H., Zhou, Q., Yang, F., Luo, L., 2014. Structure damage detection based on
node-split for survival outcomes. Bioinformatics 26 (2), 250–258. random forest recursive feature elimination. Mech. Syst. Signal Process. 46 (1),
Reichenbach, P., Rossi, M., Malamud, B.D., Mihir, M., Guzzetti, F., 2018. A review of statis- 82–90.
tically-based landslide susceptibility models. Earth-Sci. Rev. 180, 60–91.

AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Hca 1
No ratings yet
Hca 1
71 pages
Landslide Susceptibility Zonation of A Hilly Region A 2024 Natural Hazards
No ratings yet
Landslide Susceptibility Zonation of A Hilly Region A 2024 Natural Hazards
12 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
CH09 PPT Modified
No ratings yet
CH09 PPT Modified
18 pages
BMW M-4
No ratings yet
BMW M-4
108 pages
2 - Updated - Ai ML Unit 3 QB 1 2
No ratings yet
2 - Updated - Ai ML Unit 3 QB 1 2
75 pages
DWDM Lab Manual r20
No ratings yet
DWDM Lab Manual r20
97 pages
A Comparative Study On The Integrative Ability of AHP-WoE-LR Methods With The FlowR-2020
No ratings yet
A Comparative Study On The Integrative Ability of AHP-WoE-LR Methods With The FlowR-2020
38 pages
DataMining Workbook Answers
No ratings yet
DataMining Workbook Answers
18 pages
DWDM File
No ratings yet
DWDM File
26 pages
An Integrated Approach Based Landslide Susceptibility Mapping Case of Muzaffarabad Region Pakistan
No ratings yet
An Integrated Approach Based Landslide Susceptibility Mapping Case of Muzaffarabad Region Pakistan
30 pages
Landslide Susceptibility Mapping Using An Integrat
No ratings yet
Landslide Susceptibility Mapping Using An Integrat
33 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
Total Marks (15 Qns 1 Mark 15 Marks) : Business Intelligence and Analytics Assignment Week 1
No ratings yet
Total Marks (15 Qns 1 Mark 15 Marks) : Business Intelligence and Analytics Assignment Week 1
29 pages
Geosciences 10 00430 v2
No ratings yet
Geosciences 10 00430 v2
26 pages
An Overview of Machine Learning Classification Tec
No ratings yet
An Overview of Machine Learning Classification Tec
24 pages
Decision Trees Class
No ratings yet
Decision Trees Class
22 pages
Major Project Updated 2TT
No ratings yet
Major Project Updated 2TT
31 pages
2021-Geoinformation-Based Landslide Susceptibility Mapping in Subtropical Area
No ratings yet
2021-Geoinformation-Based Landslide Susceptibility Mapping in Subtropical Area
16 pages
Lecture 3
No ratings yet
Lecture 3
27 pages
s40562 022 00249 4
No ratings yet
s40562 022 00249 4
20 pages
GIS-based Assessment of Landslide Susceptibility Using Certainty Factor and Index of Entropy Models For The Qianyang County of Baoji City, China
No ratings yet
GIS-based Assessment of Landslide Susceptibility Using Certainty Factor and Index of Entropy Models For The Qianyang County of Baoji City, China
17 pages
Ensemble Learning Ilsm Revision
No ratings yet
Ensemble Learning Ilsm Revision
42 pages
Spatiotemporal Landslide Susceptibility Mapping Using Machine Learning Models - A Case Study From Distr
No ratings yet
Spatiotemporal Landslide Susceptibility Mapping Using Machine Learning Models - A Case Study From Distr
14 pages
Session 16
No ratings yet
Session 16
18 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Comparative Review of Data Driven Landslide Susceptibility Models: Case Study in The Eastern Andes Mountain Range of Colombia
No ratings yet
Comparative Review of Data Driven Landslide Susceptibility Models: Case Study in The Eastern Andes Mountain Range of Colombia
28 pages
Hybrid Machine Learning Approach For Landslide Prediction, Uttarakhand, India
No ratings yet
Hybrid Machine Learning Approach For Landslide Prediction, Uttarakhand, India
23 pages
OK-bivariate-LR-IV-Zezere-Mapping-landslide-susceptibility-using-data-driven Method-Information Value-Logistic Regression - 2017 - T2
No ratings yet
OK-bivariate-LR-IV-Zezere-Mapping-landslide-susceptibility-using-data-driven Method-Information Value-Logistic Regression - 2017 - T2
18 pages
Predictive Modeling
No ratings yet
Predictive Modeling
42 pages
Intelligent Geoengineering
No ratings yet
Intelligent Geoengineering
18 pages
Regional Dynamic Early Warning Model For Rainfall
No ratings yet
Regional Dynamic Early Warning Model For Rainfall
28 pages
Ensemble Learning Landslide Susceptibility Assessment With Optimized Non-Landslide Samples Selection
No ratings yet
Ensemble Learning Landslide Susceptibility Assessment With Optimized Non-Landslide Samples Selection
32 pages
Nhess 2020 270
No ratings yet
Nhess 2020 270
21 pages
Regional-Scale Landslide Susceptibility Assessment For The Hilly State of Uttarakhand, NW Himalaya, India
No ratings yet
Regional-Scale Landslide Susceptibility Assessment For The Hilly State of Uttarakhand, NW Himalaya, India
18 pages
Mini Project Review 1
No ratings yet
Mini Project Review 1
32 pages
A Comparative Analysis of Weight Based Machine Learning Methods For Landslide Susceptibility Mapping in Ha Giang Area
No ratings yet
A Comparative Analysis of Weight Based Machine Learning Methods For Landslide Susceptibility Mapping in Ha Giang Area
31 pages
Landslide Susceptibility Mapping Based On Random Forest and Boosted Regression Tree Models, and A Comparison of Their Performance
No ratings yet
Landslide Susceptibility Mapping Based On Random Forest and Boosted Regression Tree Models, and A Comparison of Their Performance
19 pages
Landslide Susceptibility Assessment Through Multi-Model Stacking and Meta-Learning in Poyang County China
No ratings yet
Landslide Susceptibility Assessment Through Multi-Model Stacking and Meta-Learning in Poyang County China
24 pages
Medina Et Al - 2021 - Physicaly - Based - Model - LDSLD - Susceptibility
No ratings yet
Medina Et Al - 2021 - Physicaly - Based - Model - LDSLD - Susceptibility
16 pages
Remotesensing 14 04662 v2
No ratings yet
Remotesensing 14 04662 v2
24 pages
Geosciences 10 00483 v2 - 2
No ratings yet
Geosciences 10 00483 v2 - 2
26 pages
Pourghasemi2013 Article LandslideSusceptibilityMapping
No ratings yet
Pourghasemi2013 Article LandslideSusceptibilityMapping
31 pages
Optimizing The Predictive Ability of Machine Learning Methods For Landslide Susceptibility Mapping Using SMOTE For Lishui City in Zhejiang Province, China
No ratings yet
Optimizing The Predictive Ability of Machine Learning Methods For Landslide Susceptibility Mapping Using SMOTE For Lishui City in Zhejiang Province, China
27 pages
Final Year Project 26
No ratings yet
Final Year Project 26
23 pages
Optimizing Rainfall-Triggered Landslide Thresholds
No ratings yet
Optimizing Rainfall-Triggered Landslide Thresholds
23 pages
Landslide Susceptibility Mapping at Hoa Binh Province (Vietnam) Using An Adaptive Neuro-Fuzzy Inference System and GIS
No ratings yet
Landslide Susceptibility Mapping at Hoa Binh Province (Vietnam) Using An Adaptive Neuro-Fuzzy Inference System and GIS
13 pages
R2032051
No ratings yet
R2032051
7 pages
2021-Landslide Susceptibility Mapping Using Hybrid Random Forest With GeoDetector and RFE For Factor Optimization
No ratings yet
2021-Landslide Susceptibility Mapping Using Hybrid Random Forest With GeoDetector and RFE For Factor Optimization
19 pages
A New Approach To Assess Landslide Susceptibility Based On Slope
No ratings yet
A New Approach To Assess Landslide Susceptibility Based On Slope
15 pages
10.1515 - Arh 2022 0122
No ratings yet
10.1515 - Arh 2022 0122
12 pages
OK-LR-Wu-A Comparative Study On The Landslide Susceptibility Mapping Using Logistic Regression and Statistical Index Models-2017
No ratings yet
OK-LR-Wu-A Comparative Study On The Landslide Susceptibility Mapping Using Logistic Regression and Statistical Index Models-2017
17 pages
2021-AI-powered Landslide Susceptibility Assessment in Hong Kong
No ratings yet
2021-AI-powered Landslide Susceptibility Assessment in Hong Kong
18 pages
OK-NF-EBF-DieuTienBui-Fuzzy Logic-Spatial Prediction of Landslide Hazards in Hoa Binh Province (Vietnam) - 2012b
No ratings yet
OK-NF-EBF-DieuTienBui-Fuzzy Logic-Spatial Prediction of Landslide Hazards in Hoa Binh Province (Vietnam) - 2012b
13 pages
SM Sbe13e Chapter 21
No ratings yet
SM Sbe13e Chapter 21
20 pages
Landslide Susceptibility Assessment Based On Remote Sensing Interpretation and DBN-MLP Model: A Case Study of Yiyuan County, China
No ratings yet
Landslide Susceptibility Assessment Based On Remote Sensing Interpretation and DBN-MLP Model: A Case Study of Yiyuan County, China
16 pages
Engineering Geology: Hyuck Jin Park, Jung Hyun Lee, Ik Woo
No ratings yet
Engineering Geology: Hyuck Jin Park, Jung Hyun Lee, Ik Woo
15 pages
Application of Bagging Boosting and Stacking Ensem
No ratings yet
Application of Bagging Boosting and Stacking Ensem
18 pages
Rule Engine-Decision trees-JP
No ratings yet
Rule Engine-Decision trees-JP
18 pages
Bragagnolo 2020
No ratings yet
Bragagnolo 2020
16 pages
Engineering Geology: Deliang Sun, Jiahui Xu, Haijia Wen, Danzhou Wang
No ratings yet
Engineering Geology: Deliang Sun, Jiahui Xu, Haijia Wen, Danzhou Wang
12 pages
OK-NN-LM-Bayesian-DieuTienBui-Landslide Susceptibility Assessment in The Hoa Binh Province of Vietnam-2012a-reliefAmplitu
No ratings yet
OK-NN-LM-Bayesian-DieuTienBui-Landslide Susceptibility Assessment in The Hoa Binh Province of Vietnam-2012a-reliefAmplitu
18 pages
2021-GIS-based Landslide Susceptibility Assessment Using Optimized Hybrid Machine Learning Methods
No ratings yet
2021-GIS-based Landslide Susceptibility Assessment Using Optimized Hybrid Machine Learning Methods
16 pages
Water 12 03066 v2
No ratings yet
Water 12 03066 v2
22 pages
Machine Learning Approaches For Mapping and Predicting Landslide-Prone Areas in SAo SebastiAo (Southeast Brazil)
No ratings yet
Machine Learning Approaches For Mapping and Predicting Landslide-Prone Areas in SAo SebastiAo (Southeast Brazil)
15 pages
1 s2.0 S0048969720308305 Main
No ratings yet
1 s2.0 S0048969720308305 Main
16 pages
A Data Driven Approach For Landslide Susceptibility Mapping A Case Study of Shennongjia Forestry District China
No ratings yet
A Data Driven Approach For Landslide Susceptibility Mapping A Case Study of Shennongjia Forestry District China
18 pages
A Comparison of Logistic Regression-Based Models of Susceptibility To Landslides in Western Colorado, USA
No ratings yet
A Comparison of Logistic Regression-Based Models of Susceptibility To Landslides in Western Colorado, USA
16 pages
2021-Discriminant Analysis As An Efficient Method For Landslide Susceptibility Assessment in Cities With The Scarcity of Predisposition Data
No ratings yet
2021-Discriminant Analysis As An Efficient Method For Landslide Susceptibility Assessment in Cities With The Scarcity of Predisposition Data
16 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Notes On Module 3 - Pattern Recognition
No ratings yet
Notes On Module 3 - Pattern Recognition
17 pages
Spatial Landslide Susceptibility Assessment Using Machine Lear - 2021 - Geoscien
No ratings yet
Spatial Landslide Susceptibility Assessment Using Machine Lear - 2021 - Geoscien
13 pages
OK-LR-Ramani-GIS Based Landslide Susceptibility Mapping Using Binaray Logistic Regression Analysis-2011
No ratings yet
OK-LR-Ramani-GIS Based Landslide Susceptibility Mapping Using Binaray Logistic Regression Analysis-2011
13 pages
Landslide Susceptibility Assessment Using The Maximum Entropy Model in A Sector of The Cluj-Napoca Municipality, Romania
No ratings yet
Landslide Susceptibility Assessment Using The Maximum Entropy Model in A Sector of The Cluj-Napoca Municipality, Romania
17 pages
Lee2002 PDF
No ratings yet
Lee2002 PDF
12 pages
Hybrid Model Considering Spatial Heterogeneity For Landslide Susceptibility Mapping in Zhejiang Province, China
No ratings yet
Hybrid Model Considering Spatial Heterogeneity For Landslide Susceptibility Mapping in Zhejiang Province, China
13 pages
Geomorphology: Jason N. Goetz, Richard H. Guthrie, Alexander Brenning
No ratings yet
Geomorphology: Jason N. Goetz, Richard H. Guthrie, Alexander Brenning
11 pages
Catena: A A B C D
No ratings yet
Catena: A A B C D
13 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Detect AI-generated Text Using Machine Learning
No ratings yet
Detect AI-generated Text Using Machine Learning
5 pages
Coal Blending Models For Optimum Cokemaking
No ratings yet
Coal Blending Models For Optimum Cokemaking
10 pages
IB Questionbank
No ratings yet
IB Questionbank
1 page
Quiz 2.doc Ready
No ratings yet
Quiz 2.doc Ready
3 pages
Decision Analysis
No ratings yet
Decision Analysis
50 pages

A Random Forest Model of Landslide Susceptibility Mapping Based On Gyoeroarameter Optimization Using Bayes Algorithm

Uploaded by

A Random Forest Model of Landslide Susceptibility Mapping Based On Gyoeroarameter Optimization Using Bayes Algorithm

Uploaded by

Geomorphology 362 (2020) 107201

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/geomorph

A random forest model of landslide susceptibility mapping based on

1. Introduction losses in terms of damages of houses, crops, etc. Besides, as Froude

Fig. 1. Schematic representation of computational methodology.

2.2. Data sources 3.1. Conditioning factors

Fig. 2. Location of the study case, Fengjie County, China.

˄a˅landslide type ˄b˅trigger

Fig. 3. Scale of landslide type and trigger.

Table 2 to select the optimal hyperparameter values according to the evaluation

˄a˅Elevation ˄b˅Slope ˄ c˅Aspect ˄d˅Slope position

˄e˅ ˄f˅Profile curvature ˄g˅TWI ˄h˅Lithology

˄i˅ Distance from faults ˄j˅CRDS ˄k˅NDVI ˄l˅Distance from rivers

(o) Distance from buildings ˄p˅Annual average rainfall

Fig. 4. Conditioning factors of landslide susceptibility.

Fig. 5. The process of random forest.

n_estimators The number of decision trees.

fold cross-validation method divided the whole dataset (1520 positive

Fig. 8. The sorting of conditioning factors.

Fig. 9. Landslide susceptibility map of Fengjie County.

Very low susceptibility region 2,065,678 45.94 92 6.05 0.05

Fig. 10. Examples of new landslides Map 2017.

δbεAnnual average rainfall

Fig. 11. Partial effects on landslide susceptibility of typical conditioning factors.

Fig. 12. Importance ranking of conditioning factors of landslides after screening.

6.4. Conclusion Acknowledgements

You might also like