1. Introduction
In recent years, global warming has become a significant global concern. Rising temperatures have increased the frequency and intensity of extreme events, leading to various natural disasters and posing significant risks to ecosystems and human societies. Among these, rainfall-induced landslides are particularly sensitive to climate change. China is heavily impacted by landslides, with approximately 90% triggered by intense or prolonged rainfall [
1]. According to the “National Geological Disaster Bulletin”, from 2007 to 2020, China experienced 200,000 geological disasters, 66.31% of which were landslides, resulting in 8170 deaths or missing persons, 3850 injuries, and direct economic losses of 60.51 billion yuan [
2]. To mitigate the impact of rainfall-induced landslides, it is essential to predict and assess landslide risks under future climate change conditions, providing a scientific basis for disaster prevention and mitigation strategies.
The causes of landslides are complex, involving both internal environmental conditions and external triggering factors [
3]. Internal conditions include the slope’s lithology, topography, hydrology, and geological characteristics. External factors, such as rainfall and earthquakes, provide the dynamic conditions that trigger landslides. Most landslides result from the interaction between these internal conditions and external triggers [
4].
Landslide-susceptibility assessment methods are used to evaluate the influence of static factors on landslide occurrence and predict the possibility of landslides in a specific area, often referred to as landslide susceptibility analysis [
5,
6,
7]. These methods typically use geographic information systems (GIS) to collect data on static factors like geology, topography, hydrology, soil, and vegetation. Statistical analysis is then employed to develop landslide susceptibility models, which are essential tools for regional landslide spatial prediction [
8]. In the tropical environment of Malaysia, Himan Shahabi and Mazlan Hashim used GIS-based statistical models and remote sensing data to achieve landslide-susceptibility calculations with an accuracy of up to 96% [
9]. Amina Abdi et al. employed GIS-based fuzzy logic and the analytic hierarchy process to map landslide susceptibility. They found that the fuzzy logic method was more effective in assessing landslide susceptibility, demonstrating higher consistency and accuracy [
10].
For the external triggering factors, earthquakes and rainfall are different in their regime to trigger a landslide. In this study, we focus on rainfall-triggered landslides. Most research focuses on the relationship between landslides and rainfall, and the methods used are mainly divided into theoretical and statistical analyses [
11,
12,
13,
14]. Theoretical analysis investigates the mechanisms of rainfall-induced landslides by developing physical models [
15]. However, due to the complexity of physical processes and specific characteristics, this approach is more suitable for studying region-limited slopes and is ineffective for large-scale landslide prediction.
Statistical analysis methods rely on extensive historical landslide data and associated rainfall information. By establishing statistical relationships between landslides and rainfall, researchers can determine rainfall threshold values that trigger landslides. This approach is the most commonly used for landslide prediction. Rainfall thresholds, the critical values at which rainfall induces slope instability, vary based on the rainfall characterization variables used. When only cumulative rainfall is considered, the threshold is a specific value, such as the 1-day or 3-day cumulative rainfall threshold [
16]. When two rainfall variables are considered together, the threshold typically takes the form of a power function curve [
17]. Statistical analysis methods have been widely applied in practice and play a significant role in landslide prediction and early warning. Statistics indicate that approximately three-quarters of the world’s landslide-prediction and early warning systems are based on empirical rainfall threshold frameworks [
18].
Apart from the rainfall that triggers a landslide, the soil water content affected by antecedent rainfall has been proved to be important for the occurrence of a landslide as well [
19,
20,
21]. Antecedent rainfall raises the water content in slope materials, increasing pore water pressure, reducing stability, and triggering landslides. Some research used antecedent soil moisture for the prediction of landslides; however, the acquisition of antecedent soil moisture is challenging and exhibits significant spatial heterogeneity, necessitating the use of more complex models for accurate prediction [
22]. On the other hand, antecedent rainfall is easily obtainable and can be monitored in real time using weather stations and satellite data. Thus, considering antecedent rainfall can effectively account for slope stability influenced by previous soil-moisture conditions [
23].
Climate change-induced increases in surface temperatures and alterations in precipitation patterns have led to more frequent extreme rainfall events in many regions worldwide [
24]. In China, He et al. first utilized a landslide statistical forecasting model and the regional climate model RegCM4.0 to project a significant increase in landslides during the 21st century under the high emission scenario RCP8.5 [
25]. Similarly, Ge et al. employed global climate model precipitation data and statistical models to forecast that, under the RCP8.5 scenario in 2050, the landslide risk in China will generally rise [
26]. However, these studies only considered the precipitation but ignored the antecedent soil wetness, which inevitably influences the occurrence of rainfall-induced landslides and other geological disasters [
27].
In summary, this study develops two methods based on cumulative rainfall and the antecedent soil-wetness index, respectively, to assess the landslide risk in China under climate change. These methods are integrated with topography, geomorphology, and rock-soil characteristics to construct a landslide threshold model. Using multiple CMIP6 GCM ensembles, the models simulate rainfall-induced landslides under climate change scenarios. The aim is to project changes in the distribution of rainfall-induced landslides in China under future climate scenarios, identify high-risk areas, and provide a scientific basis for disaster prevention and mitigation.
2. Study Area and Data
The study area of this research is China, which is situated in the eastern part of the Eurasian continent, on the western coast of the Pacific Ocean. Its geographical coordinates extend from 73°40′ E to 135°2′30″ E and from 3°52′ N to 53°55′ N, encompassing a land area over 9.6 million km2. Due to the diverse climatic conditions, land features, and other geographical environments across its vast territory, the occurrence of landslides varies significantly from region to region.
To construct the landslide-prediction model, this study utilized the following data:
Rainfall. This study used the CHIRPS (Climate Hazards group InfraRed Precipitation with Station data) satellite rainfall product, developed by the U.S. Geological Survey and the Climate Hazards Group at the University of California, Santa Barbara. CHIRPS offers multiple temporal scales (daily to monthly), spatial resolutions (0.05° to 0.25°), near-global coverage (50° S to 50° N), and long-term records (from 1981 to the present). The dataset used in this study spans from 1 January 1981 to 31 December 2020, with a spatial resolution of 0.05° × 0.05°.
Temperature. This study used the CHIRTS (Climate Hazards Center InfraRed Temperature with Stations data) daily temperature dataset, developed by the Climate Hazards Center at the University of California, Santa Barbara. This product provides daily maximum and minimum temperatures from 1983 to 2016, covering latitudes from 60° S to 70° N, with resolutions of 0.25° × 0.25° and 0.05° × 0.05°. We used the CHIRTS dataset with the 0.05° × 0.05° resolution to obtain daily average temperatures. To address data gaps in CHIRTS (1981–1982 and 2017–2020), we supplemented it with the CN05.1 daily observational dataset (0.25° × 0.25°) published by the National Climate Center of the China Meteorological Administration [
28], applying bilinear interpolation to align the resolution.
Climate datasets. This study also incorporates multi-model climate datasets from the Coupled Model Intercomparison Project Phase 6 (CMIP6). To ensure consistency, we selected 15 climate models from the same experiment run (rli1p1f1) (
Table 1). Each model provides daily precipitation and temperature data for both the reference period (1981–2014) and the projection period (2015–2100). The climate change scenarios used are SSP1-2.6, SSP2-4.5, and SSP5-8.5. The reference period is set from 1981 to 2014, and the future period is set from 2015 to 2100; this is further divided into near-term (2015–2040), mid-term (2041–2070), and long-term (2071–2100).
Although CMIP6 has improved in its model accuracy and resolution, significant discrepancies still exist between GCM-simulated climate changes and the actual climate system due to its complexity. These discrepancies make it challenging to use GCM simulations directly for future climate change predictions. To achieve more accurate future rainfall-induced landslide simulations that reflect the real conditions of the study area, it is necessary to ensure that GCM simulation data used as driving conditions for the landslide-prediction model align with actual climate conditions. Therefore, this study employs the grid as the basic unit and uses Quantile Delta Mapping (QDM) and Multivariate Bias Correction (MBC) methods to correct biases in the GCMs [
29,
30]. Additionally, to address the resolution differences between GCMs-simulated rainfall and observed rainfall (CHIRPS) and to ensure consistency across models, the Spatial Disaggregation (SD) method is applied to downscale the bias-corrected models, standardizing the resolution to 0.05° × 0.05° [
31,
32].
Landslides dataset. We used an observation dataset of landslide to validate the landslide-prediction model. The records are from the Global Landslide Catalog (GLC), with examples of landslide information provided in
Table 2. Since this study focuses on rainfall-induced landslides, the GLC database was further filtered by the triggering factor. Given the sparse landslide records before 2005, we selected events that occurred in China post-2006 and were triggered by rainfall. This process resulted in a final selection of 482 rainfall-induced landslide records.
3. Methodology
This study employs the LHASA (Landslide Hazard Assessment for Situational Awareness) model for assessing landslide risk. Developed by the NASA Goddard Space Flight Center, this model aims to identify potential landslides and provide near-real-time warning signals [
33]. The model includes two components: static variables contributing to slope instability (landslide sensitivity maps) and a threshold model for landslide prediction.
The decision-making process of the LHASA model is shown in
Figure 1. First, hydrometeorological characteristics, such as cumulative rainfall and the antecedent effective rainfall index, are obtained for each grid cell. Using the relationship between these factors and historical landslides, a landslide threshold model is constructed to determine if an extreme rainfall warning should be issued. If no extreme rainfall warning is issued, landslides are unlikely to occur in the area. If an extreme rainfall warning is issued, the sensitivity map is then considered. Areas with low sensitivity have a low probability of landslides and do not receive a risk warning. Areas with medium sensitivity receive a moderate-risk warning, and areas with high sensitivity receive a high-risk warning.
3.1. Bias Correction of Climate Models
Due to the inherent complexity of the climate system, GCMs often exhibit significant biases when simulating climate changes, making direct use of these models for future climate predictions challenging. Therefore, it is essential to correct the GCM data to match the actual climate conditions of the region before they can be used as inputs for the landslide-prediction models. Two bias-correction methods are employed: Quantile Delta Mapping (QDM), proposed by Cannon et al. (2015) [
30], and Multivariate Bias Correction (MBC), which uses N-dimensional Probability Density Function Transform (N-pdft), as described by Cannon (2018) [
29]. This study uses 1981 to 2010 as the reference period and 2011 to 2014 as the validation period, with grids serving as the basic unit for bias correction.
- (1)
Quantile Delta Mapping (QDM)
QDM operates on the assumption that the biases in climate models are consistent over time, implying that the biases observed in the historical period will persist into the future. First, simulated precipitation values are adjusted using the inverse of the observed precipitation-distribution function to correct systematic biases. Next, the relative changes in quantiles between the historical and future periods are calculated. Finally, these factors are combined to yield bias-corrected precipitation values. Thus, QDM effectively corrects the model’s systematic biases while preserving the relative changes projected by the model. The specific calculation formula is as follows:
In this formula, represents the observed precipitation during the historical period, and denotes the simulated precipitation for the future period. The terms and refer to the cumulative distribution functions of the model for the historical and future periods, respectively. is the detrended simulated precipitation for the future period, indicates the model’s predicted trend value, and represents the bias-corrected simulated precipitation for the future period.
- (2)
Multivariate Bias Correction Method (MBCn)
The MBCn method extends QDM to a multivariate framework by integrating QDM with the image-processing technique known as N-pdft. In the context of climate model bias correction, Cannon applied orthogonal transformations to create linear combinations of original variables, facilitating univariate quantile mapping bias correction for multivariate distributions. The algorithm consists of three steps:
First, an N × N uniformly distributed random orthogonal matrix
R[
j] is constructed, and orthogonal transformations are applied to both the source and target data.
In the formula, denotes the simulated data for the historical period, denotes the simulated data for the future period, denotes the observed data for the historical period, and denotes the iteration count.
Second, the orthogonally transformed data are input into Formulas (4)–(6) to adjust their marginal distributions using the QDM method. Finally, the results are inversely transformed. These three steps are repeated iteratively until the multivariate distribution aligns with the target distribution. In this study, we chose to perform 30 iterations [
29].
3.2. Landslide-Sensitivity Calculation
This study integrates the landslide-sensitivity calculation methods of Stanley and Kirschbaum [
34] by using a fuzzy overlay model to combine seven environmental factors (slope, geological lithology, distance to fault zones, roads, water systems, newly developed urban areas, and forest loss) into a 1-km resolution sensitivity map of China.
Firstly, the nonlinear relationship between environmental factors and landslide sensitivity is modeled using fuzzy membership functions, which convert the characteristic values of these factors into membership degrees ranging from 0 to 1, where higher values indicate greater sensitivity. Then, the “fuzzy gamma” operator is employed to combine these membership degrees, producing the initial sensitivity output set.
To facilitate understanding and further use, the sensitivity values are classified into five categories: very low, low, medium, high, and very high. The number of grid cells in each category increases progressively. For instance, the “very low” category contains twice as many grid cells as the “low” category, and the “low” category contains twice as many as the “medium” category. Ultimately, the “very low” and “low” categories are combined into low sensitivity, the “medium” and “high” categories into medium sensitivity, and the “very high” category remains as high sensitivity. Low sensitivity indicates a lower likelihood of landslides in the area, while high sensitivity indicates a higher likelihood of landslides [
34].
3.3. Construction of the Landslide Threshold Model
This study develops a univariate threshold model, where a single variable’s critical value serves as the landslide-triggering threshold. First, historical rainfall data is collected to create a continuous dataset. Each grid cell is assigned an extreme value, defined as the 95th percentile of historical rainfall, as the landslide threshold. Comparing target rainfall to this threshold allows for determining the likelihood of a landslide event in the area.
In constructing the univariate rainfall threshold model, cumulative rainfall and antecedent effective rainfall are used as rainfall characterization variables, denoted as MODEL 1 and MODEL 2, respectively. This approach is chosen because relying solely on daily rainfall is not sufficient to judge extreme rainfall events. Continuous rainfall over several days can increase soil moisture, potentially triggering landslides even if the daily rainfall is low. Therefore, assessing landslide risk based only on daily rainfall is inadequate. Studies by Mirus et al. have shown that incorporating 3-day antecedent rainfall improves the performance of landslide-prediction models. Additionally, using a 3-day period to distinguish between antecedent and recent rainfall is common in other studies [
35]. To better evaluate the impact of rainfall on landslides, the rainfall from the 3 days preceding an event is used as a recent rainfall indicator, named as 3-day cumulative rainfall.
As soil wetness is difficult to measure in high spatial-temporal resolution and it is highly related to rainfall, in this study the antecedent effective rainfall index is utilized to represent the soil-wetness condition, based on prior daily rainfall data. It should be noted that the antecedent effective rainfall index functions as a soil moisture index, enabling the estimation of the relative wetness condition of the soil [
36]. The antecedent effective rainfall index accounts for losses due to processes like evaporation and runoff by using a decay coefficient. After these losses are deducted, the antecedent rainfall is accumulated. The calculation equation is as follows:
In the equation, t represents the number of prior days, pt denotes the rainfall on the t-th day before, and wt is the decay coefficient, a weight assigned to the rainfall on the t-th day before.
This formula shows that the antecedent effective rainfall index is determined by the combination of the number of days and the decay coefficient. Different choices of t significantly impact the model. To find the optimal number of days, we compare 3-day and 7-day antecedent rainfall amounts, with the decay coefficient w ranging from 1 to 3. When w = 1, the ARI equals the cumulative antecedent rainfall without any decay. Higher values of w indicate greater losses of antecedent rainfall due to evaporation and runoff.
To determine the optimal parameters, this study used CHIRPS rainfall data from 1981 to 2018 for calibration and validation. By combining different values of
t and
w, we generated the antecedent effective rainfall index (ARI) series for 1981–2018. Since most landslide records are from 2008–2018, we used the 95th percentile of ARI from 1981–2007 as the historical threshold. Landslide data from the GLC database and 5000 additional non-landslide time-location points were selected, with 80% used for calibration and 20% for validation (validation results in
Section 4.2). We calculated the ARI for each event under different parameters and used its exceedance of the historical threshold to predict landslides. The optimal
t and
w were determined by the Euclidean distance with the detailed calculation described in
Section 3.3. The parameter calibration results are shown in
Figure 2. The Euclidean distance decreases sharply with increasing
w, with the 3-day ARI significantly larger than the 7-day ARI. Beyond
w = 2, the decrease stabilizes, and the 3-day ARI’s Euclidean distance becomes smaller than the 7-day ARI. At
w = 2.3, the 3-day ARI’s Euclidean distance is minimized to 0.3583, making 3 days and 2.3 the optimal combination.
3.4. Landslide-Prediction Model Evaluation
Landslide-detection results can be classified into four categories: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). True Positives are actual landslides correctly identified by the model. False Positives are non-landslides incorrectly identified as landslides by the model. True Negatives are non-landslides correctly identified by the model. False Negatives are actual landslides incorrectly identified as non-landslides by the model. Based on these categories, we can calculate evaluation metrics such as hit rate, false alarm rate, and Euclidean distance. The specific equations for these calculations are shown below.
- (1)
Hit Rate
The hit rate, also known as the True Positive Rate (TPR), is the proportion of correctly detected landslide events among all landslide events. The optimal value is 1. The calculation equation is as follows:
- (2)
False Alarm Rate
The false alarm rate, or False Positive Rate (FPR), is the proportion of non-landslide events incorrectly reported as landslides among all non-landslide events. The optimal value is 0. The calculation equation is as follows:
- (3)
Euclidean Distance
The optimal classification result occurs when the hit rate is 1 and the false alarm rate is 0. To better evaluate the performance of the landslide-prediction model, we use the Euclidean distance, which considers both the hit rate and the false alarm rate. The model performance is assessed by calculating the Euclidean distance (d) between the prediction results and the optimal value. A smaller distance indicates better model performance. The calculation equation is as follows:
- (4)
Spatial Buffer and Time Window
Before conducting landslide-risk estimation, it is essential to evaluate the model performance using landslide records. Historical landslide records may have inaccuracies, such as delayed recording times and location displacements. To address these, spatial buffers and time windows are established for model validation. Spatial buffers correct location errors, while time windows address timing inaccuracies. We use a spatial buffer radius of one grid cell. Time windows of 1, 3, and 7 days are selected. A 1-day window assumes no timing error. A 3-day window accounts for a possible 1-day deviation before or after the recorded time. A 7-day window allows for errors ranging from 5 days before to 1 day after the recorded time. The range of 5 days before and 1 day after is chosen because field records typically note the discovery day or a few days before, rather than after [
33].
5. Conclusions
This study conducted a prediction of rainfall-induced landslide risk in China using multi-scenario and multi-model climate data from CMIP6, with a focus on constructing rainfall-induced landslide threshold models and analyzing the spatiotemporal distribution of landslide risk.
Firstly, the study employed a fuzzy overlay model to integrate seven environmental disaster factors: slope, geological lithology, distance to fault zones, roads, water systems, urban expansion, and forest loss. This model calculates landslide sensitivity across China with a resolution of 1 km. Validation using historical landslide events shows that over 85% of these events fall within medium- to high- sensitivity areas. The sensitivity maps produced by this method effectively identify most landslide-prone areas and can be utilized in the development of subsequent landslide threshold models.
Additionally, different rainfall characterization methods were used to construct landslide threshold models. The threshold model based on the Antecedent Rainfall Index (ARI) showed higher accuracy and lower false alarm rates, proving more effective for landslide forecasting. Using this method, we derive 60 landslide thresholds from two landslide threshold models, two bias corrections, and 15 GCMs, which will be used for subsequent landslide risk prediction.
Finally, the study conducted the prediction of rainfall-induced landslide risk under multiple scenarios and models. In the early 21st century, the Qinghai–Tibet Plateau, Southwest, and parts of the Southeast regions showed a decrease in landslide risk by 5% to 10%, while other regions saw an increase of 5% to 20% compared to the reference period. By the mid-21st century, areas with decreasing risk continued to shrink, with most regions experiencing an increase of 10% to 40%. By the late 21st century, the nationwide risk had increased by more than 15%. Spatially, the increase in relative landslide risk showed a pattern of gradual increase from east to west.