Monitoring Soil Salinity in Arid Areas of Northern Xinjiang Using Multi-Source Satellite Data: A Trusted Deep Learning Framework

Zhang, Mengli; Fan, Xianglong; Gao, Pan; Guo, Li; Huang, Xuanrong; Gao, Xiuwen; Pang, Jinpeng; Tan, Fei

doi:10.3390/land14010110

Open AccessArticle

Monitoring Soil Salinity in Arid Areas of Northern Xinjiang Using Multi-Source Satellite Data: A Trusted Deep Learning Framework

by

Mengli Zhang

¹,

Xianglong Fan

²,

Pan Gao

^1,*

,

Li Guo

¹,

Xuanrong Huang

¹,

Xiuwen Gao

¹,

Jinpeng Pang

¹ and

Fei Tan

^1,*

¹

College of Information Science and Technology, Shihezi University, Shihezi 832061, China

²

Agricultural College, Shihezi University, Shihezi 832003, China

^*

Authors to whom correspondence should be addressed.

Land 2025, 14(1), 110; https://doi.org/10.3390/land14010110

Submission received: 2 December 2024 / Revised: 28 December 2024 / Accepted: 30 December 2024 / Published: 8 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

Soil salinization affects agricultural productivity and ecosystem health in Xinjiang, especially in arid areas. The region’s complex topography and limited agricultural data emphasize the pressing need for effective, large-scale monitoring technologies. Therefore, 1044 soil samples were collected from arid farmland in northern Xinjiang, and the potential effectiveness of soil salinity monitoring was explored by combining environmental variables with Landsat 8 and Sentinel-2. The study applied four types of feature selection algorithms: Random Forest (RF), Competitive Adaptive Reweighted Sampling (CARS), Uninformative Variable Elimination (UVE), and Successive Projections Algorithm (SPA). These variables are then integrated into various machine learning models—such as Ensemble Tree (ETree), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and LightBoost—as well as deep learning models, including Convolutional Neural Networks (CNN), Residual Networks (ResNet), Multilayer Perceptrons (MLP), and Kolmogorov–Arnold Networks (KAN), for modeling. The results suggest that fertilizer use plays a critical role in soil salinization processes. Notably, the interpretable model KAN achieved an accuracy of 0.75 in correctly classifying the degree of soil salinity. This study highlights the potential of integrating multi-source remote sensing data with deep learning technologies, offering a pathway to large-scale soil salinity monitoring, and thereby providing valuable support for soil management.

Keywords:

neural network; multi-source satellite data; interpretable deep learning; Google Earth Engine

1. Introduction

Soil salinization, a global agricultural and environmental challenge, has far-reaching adverse effects on soil properties. It not only restricts crop growth and development but also reduces microbial activity and fertility in the soil, thereby disturbing the equilibrium of ecosystems and posing a major threat to plant health and agricultural output [1]. Xinjiang is an arid and semi-arid region with many types of saline soils and a wide area, making it a typical inland-type saline area. The data of the second national soil census show that the total area of various types of saline soil in Xinjiang reaches 13.361 million hectares, accounting for 36.8% of the total saline soil area in the country, and there is an urgent need for timely treatment and long-term planning [2]. The main causes of salinization include climatic factors, such as low rainfall and high temperatures, accumulation of soluble salts in the soil due to hydrogeology, poorly maintained irrigation setups, and unscientific cultivation habits of farmers [3]. These factors work together to exacerbate the accumulation of soil salts, seriously affecting agricultural production and the ecological environment [4]. Therefore, it is important to formulate effective management measures and sustainable agricultural strategies to address the current situation of soil salinization in Xinjiang. The problem of soil salinization is particularly acute in the agriculturally important areas of northern Xinjiang, China, due to the unique geographical conditions of the inland arid zone. The sparse rainfall and intense evaporation in this region make it easy for salts to accumulate on the surface of arid soils, forming saline soils, a phenomenon that not only hinders the healthy growth of crops but also reduces the effectiveness of key nutrients in the soil, directly affecting the yield and quality of the crops grown in the soil. In addition, the presence of saline soils makes land management more difficult and costly and makes it difficult to provide accurate and timely information to support agricultural production activities in the vast and varied landscapes of the Xinjiang region.

The advancement of precision agriculture has led to the potential application of remote sensing technology for monitoring salinity in agricultural soils. Remote sensing by drones has become a highlight of the field because of its flexibility and high accuracy [5]; however, its range is often limited by flight endurance and speed, and it is mainly suitable for small-scale area monitoring. On the contrary, satellite remote sensing, with its wide-area coverage and high-frequency updates, is gaining attention for tracking agricultural dynamics and assisting in strategic planning. In particular, it is worth pointing out that the Sentinel-2 series of satellites launched by the European Space Agency (ESA) can provide more in-depth remote sensing support for agriculture and other fields by significantly improving temporal, spatial, and spectral resolution. In addition, the Landsat series of satellites, carrying decades of remote sensing data accumulation, has irreplaceable value for long-term environmental change and land use dynamic analysis. Integration of multi-source satellite data effectively enhances the recognition accuracy of surface features [6,7]. A study by Wang combined data resources from unmanned aerial vehicles (UAVs) and Sentinel-2A to construct models for regions with different salt concentrations, improving the accuracy of salt inversion [8]. Zhang’s study [9] further verified that the fusion of multi-source satellite data enriches data dimensions and significantly reduces the uncertainty factor in sea ice thickness assessments. Similarly, the work by El-Rawy [10,11] revealed the effectiveness of multi-source satellite data in assessing soil conductivity and salt content on inter-temporal and spatial scales. In summary, the integration of multi-source satellite data not only strengthens the effect of soil property monitoring, but also paves a solid information foundation for the practice of smart agriculture.

The rapid progress of pedological remote sensing technology has greatly contributed to the level of accuracy in soil salinity detection. By combining multispectral imagery with ground characteristics, soil salinity can be effectively assessed and salinity indicators defined [12,13]. The selection of appropriate spectral indices for soil monitoring is a critical step in ensuring data accuracy in the study area, given its unique topography and ecosystem characteristics [14]. With the help of soil texture analyses, Duan et al. aim to differentiate between different geospatial layouts of soils and provide a strategic orientation for the efficient management of soil resources [15]. Furthermore, the integration of environmental covariates, such as topographic relief, soil type, and vegetation cover, can greatly improve the predictive effectiveness of soil salinity monitoring models, especially in arid and semi-arid regions [16]. For example, Zhao et al. verified the efficacy of a variety of auxiliary variables in assessing soil conductivity and revealed that the inclusion of topographic elements can substantially improve the accuracy of the model in a case study in Karamay, Xinjiang [17]. The work of Emami and other scholars, on the other hand, was designed to assist in the planning of soil management strategies in northern Iran by identifying the elements of environmental influence that dominate the distribution of soil salinity in the region [18]. In summary, this integrated multi-source information assessment method, by linking spectral data analysis and environmental variables, greatly enhances the accuracy of the soil salinity monitoring model, enriches the information content of the decision support system, and lays a solid foundation for the scientific formulation of soil management planning and intervention [19].

Currently, machine learning techniques have been widely used in the field of soil salinity prediction, showing significant advantages over traditional means of dealing with complex nonlinear relationships. Among them, the integrated learning approach has attracted much attention for its ability to effectively deal with the challenges of high-dimensional data and enhance the generalization ability of the model [20]. The conclusion that Random Forest, as a robust integrated learning strategy, is widely used in soil salinity assessment and is particularly suitable for dealing with bare soil environments is supported by several studies [21,22,23]. Nevertheless, machine learning models still face the limitations of insufficient interpretability and high dependence on large data volumes in practical applications, resulting in compromised performance when samples are scarce or data quality is low, and are susceptible to data noise and outliers. It has been shown that machine learning methods incorporating SHapley Additive exPlanations (SHAP) value analysis can significantly improve the explanatory power of the model and help to gain a deeper understanding of the contribution of each feature variable to model construction [24]. On the other hand, the neural network algorithm demonstrated more stable performance with higher accuracy on the salt prediction task [25]. In particular, when constructing a hierarchical neural network architecture for fine-grained classification of soil salinity levels, its superiority surpasses traditional machine learning methods and improves the accuracy of prediction [26]. In addition, neural network techniques have confirmed superior performance compared to machine learning techniques in mapping soil salinity distribution [27].

Existing studies mainly focus on a single data source, which cannot fully reflect the complexity of soil salinity changes. This study overcame this limitation by integrating data from multiple sources and considering the effects of environmental factors on soil. An assessment method was also used to ensure the reliability of the model and the credibility of the prediction results. The method has an important potential for application in soil management in arid regions and can provide more accurate data support for decision-makers. Given this, this study aims to explore the feasibility of an interpretable modeling approach combined with multi-source satellite remote sensing imagery (covering Landsat 8 and Sentinel-2) and environmental parameters to monitor soil salinization phenomena at a large scale in the northern Xinjiang farmland territory. Specifically, this can be categorized into three research objectives, including:

(1): To systematically assess and improve the accuracy of soil salinity prediction models by integrating multi-source satellite data and a series of environmental auxiliary variables, and to identify the characteristic variables affecting soil salinity;
(2): To accurately assess the degree of soil salinity in arid farmland in northern Xinjiang and ensure the credibility of the results;
(3): Applying modules with good interpretability in the modeling process to enhance the learning effectiveness of the model.

2. Materials and Methods

2.1. Overview of the Study Area

Xinjiang is situated in the northwestern part of China. As illustrated in Figure 1. The total area of arable land in Xinjiang is 7,038,600 hectares. Of this, paddy fields account for 0.85 percent of the territory’s arable land, irrigated land accounts for 96.00 percent of the territory’s arable land, and dry land accounts for 3.15 percent of the territory’s arable land. The climate in northern Xinjiang is temperate continental arid to semi-arid, with low average annual rainfall, usually between 100 and 200 mm, but varying according to topographical differences. For example, annual rainfall is about 150 mm in the Tarim Basin and up to 200 mm in parts of the southern foothills of the Tianshan Mountains. Rainfall is mainly concentrated between July and September, accounting for more than 70 percent of the year. This uneven spatial and temporal distribution of climate, combined with high evapotranspiration, leads to insufficient soil moisture and exacerbates soil salinization. The wet season in Northern Xinjiang usually occurs from June to September, with most of the rainfall concentrated in July and August. During this period, average temperatures range from 25 °C to 35 °C, with higher levels of relative humidity compared to the dry season. The main cash crop in the area is cotton. This unique climatic condition results in soils that are prone to salinization, which presents a challenge to agricultural production and increases the complexity of agricultural management. Regarding the criteria and methods of sample site selection, we based them on the distribution of cotton fields in northern Xinjiang, which are mainly distributed in these five, six, seven, and eight divisions in total, combined them with the digital elevation model (DEM) and topographic maps, and based on the data on the distribution of cotton fields in northern Xinjiang released by the Bureau of Agriculture, we selected the main cotton planting areas of the four divisions as the base area for sample site selection. Within each division area, the stratified random sampling method was used to randomly select representative sample points, and the specific location of the sample points was finally determined through field verification and adjustment. The whole process ensured the scientific distribution of the sample points and the reliability of the data, which provided solid data support for the subsequent soil salinity analysis. After collecting soil samples, the soil needs to be stored in separate bags according to the sample sequence. The sampling operation is then carried out for subsequent experiments, as shown in Figure 2.

2.2. Collection and Analysis of Soil Samples

In this study, soil sampling was conducted in April 2021 in the primary cotton cultivation regions of the fifth, sixth, seventh, and eighth divisions in northern Xinjiang. The sampling was conducted using the five-point plum method, in conjunction with GPS positioning technology for distribution, and a total of 1044 soil samples were collected at a depth of 0–30 cm. The soil samples were combined with distilled water in a 1:5 ratio and then agitated in a thermostatic oscillator for 30 min to facilitate sufficient soil dissolution. Subsequently, the mixture was allowed to stand for a period of time to allow for the separation of the supernatant, and the electrical conductivity (EC, ds/m) of the soil was then measured using a conductivity meter (Model S230, Mettler Toledo, Shanghai, China). To obtain the mean value, three replicate measurements were made for each sample. Subsequently, the soil salt content (SSC) was calculated according to the established empirical formula [28], allowing for a more detailed analysis of the soil salinity levels, which were then classified according to the criteria outlined in Table 1 [29].

y = \frac{(x + 41.2543) \times 5}{2120.76}

(1)

In this equation,

y

denotes the SSC in g/kg and

x

denotes the electrical conductivity (EC, μS/cm) of the soil.

2.3. Acquisition and Processing of Satellite Images from Sentinel-2 and Landsat 8

Firstly, the vector maps were merged according to the study area in northern Xinjiang, and the shapefile of the study area was then imported into the Google Earth Engine (GEE) cloud platform (https://earthengine.google.com/ (accessed on 8 March 2024)) by using the method ‘ee.FeatureCollection’ to call the shapefile. Sentinel-2 and Landsat 8 data were acquired within the GEE environment, and the two different data sources were subjected to cloud removal (QA60 for Sentinel-2) and (CLOUD_COVER, 60 for Landsat 8). Specifically, the ‘ee.FeatureCollection’ command, which aggregates all the elements in the shapefile, was used. Furthermore, for Sentinel-2 image processing, we applied the QA60 band to mask clouds and cloud shadows. QA60 provides information on whether each pixel is covered by clouds. The CLOUD_COVER parameter was set to 60 for Landsat 8 images, thereby excluding images with more than 60% cloud coverage. And then the images of the two satellites in April 2021 were downloaded using the median synthesis method for the 10 m resolution of the study area where they are located. The study area was imaged at 30 m resolution using Sentinel-2 and Landsat 8 satellites, and the spectral curves for the area were acquired in GEE. These data are presented in Figure 3. Subsequently, the 30 m resolution Landsat 8 image was resampled to 10 m in ArcGIS software, version 10.2.

2.4. Environment Variable Selection

In order to assess soil salinity in-depth, this study builds on previous research [30,31,32]. A total of 23 environmental covariates were extracted, and key terrain characterization parameters, including elevation, slope, slope direction, curvature, hill shadow, terrain undulation, and terrain roughness, were obtained from the Digital Elevation Model (https://earthengine.google.com/ (accessed on 11 April 2024)). The nighttime lighting data were sourced from the Earth Observatory (https://earthobservatory.nasa.gov/ (accessed on 13 April 2024)) and the spatial distribution data of China’s soil types in the Digital Earth Open Platform (https://open.geovisearth.com/ (accessed on 15 April 2024)) were downloaded with soil type, soil parent material, clay, sand, and silt soil attributes. The data on air temperature and rainfall were obtained from the National Meteorological Science Data Center (https://data.cma.cn/ (accessed on 16 April 2024)). The population distribution data were obtained from the Resource and Environment Science Data Platform (https://www.resdc.cn/ (accessed on 20 April 2024)). Agricultural film data for the study area were obtained from the Xinjiang Bureau of Statistics for the year 2021 (http://tjj.xinjiang.gov.cn/tjj/xjq/list_dq.shtml (accessed on 25 April 2024)). The data on railroad density and highway density were sourced from the OpenStreetMap website (https://www.openstreetmap.org/ (accessed on 10 May 2024)). The data on nitrogen, phosphorus, potash, and compound fertilizer usage in the study area were sourced from the China Statistical Yearbook (http://www.tjcn.org/ (accessed on 12 May 2024)), which provides detailed agricultural statistics, including regional fertilizer application data. The tassel cap transformation index was derived through the analysis of satellite remote sensing images, which revealed a linear transformation. A detailed account of the variables selected for this study can be found in Table 2.

Specifically, we selected 23 environmental covariates that encompass key aspects such as soil salinization detection, vegetation health assessment, and analysis of topography and hydrological conditions. This comprehensive selection aims to accurately reflect the spatial distribution and influencing mechanisms of soil salinization in the cotton fields of northern Xinjiang. The selected covariates include the Salinity Index (Salinity Index 1–6), Salt Index (Salt Index 1–3), and Normalized Difference Salinity Index (NDSI). These indices combine reflectance from different remote sensing bands to sensitively capture the accumulation and distribution of salts in the soil, providing direct data for salinization monitoring. Additionally, the Intensity Index (Intensity Index 1–2) quantifies the severity of soil salinization by reflecting the concentration levels of salts, thereby assisting in the assessment of salinization intensity. Vegetation indices such as NDVI, ENDVI, EVI, EEVI, and GDVI indirectly reflect the impact of salinization on vegetation growth by evaluating vegetation coverage and health. High salinity environments typically inhibit vegetation growth, leading to a decrease in these index values. Soil indices, including Soil Difference Index (SDI) and Tasseled Cap indices (TCB, TCG, TCW), analyze soil and surface brightness, greenness, and wetness to reveal the influence of salt accumulation and moisture conditions on salinization. Furthermore, the Combined Spectral Response Index (Canopy Salt Response Index and Combined Spectral Response Index) integrates information from multiple spectral bands to enhance the detection of salinization characteristics under complex surface conditions, thereby improving monitoring accuracy. The selection of these environmental covariates was based on their high sensitivity and specificity in remote sensing monitoring, effectively capturing spectral changes, vegetation responses, and dynamic variations in topography and hydrological conditions during the salinization process. This robust selection of covariates provides a solid scientific foundation for the spatial analysis and mechanistic study of soil salinization. By comprehensively utilizing these variables, our study is able to thoroughly and accurately assess the spatial distribution and influencing mechanisms of soil salinization in the cotton fields of northern Xinjiang, ensuring the reliability and scientific validity of the research results.

2.5. Feature Selection Methods

In the field of spectral research for salt monitoring, we constructed feature sets through a systematic screening process that fuses spectral and environmental variables and employs four state-of-the-art feature selection techniques. These carefully constructed datasets are then deployed into machine learning and deep learning models, with the aim of revealing the impact of different feature extraction strategies on improving model prediction accuracy through detailed performance comparisons, and identifying the most efficient modeling pathways for salt monitoring. A total of four feature selection algorithms were adopted in this study: SPA [40], CARS [41], UVE [42], and RF [43]. The SPA is able to reduce redundancy among variables while selecting the most characterizing variables. CARS, which draws on the integration of competitive mechanisms with Monte Carlo and partial least squares regression, dynamically adjusts the spectral variable weights and optimizes the variable clusters. UVE excels at filtering out data noise, demonstrating efficiency and directness in high dimensional data analysis. The Random Forest algorithm, which filters variables based on their importance ratings, defines a benchmark for differentiating the importance of a variable by considering those variables whose ratings exceed the mean as key variables to be included in the subsequent model construction and analysis.

The CARS algorithm simulates natural selection mechanisms, effectively reducing redundancy in high-dimensional data. However, its high computational cost makes it suitable primarily for applications with moderate sample sizes and stringent precision requirements. The UVE algorithm, based on Partial Least Squares Regression (PLSR), offers high stability and strong interpretability, particularly excelling in scenarios with multicollinearity, thereby enhancing both model interpretability and predictive accuracy. The SPA employs vector projection analysis to identify candidate wavelengths with the largest projection vectors, ultimately determining a combination of feature wavelengths that effectively minimizes redundancy and collinearity, thereby improving model performance. Nevertheless, the performance gains are limited due to the unsupervised nature of the feature selection process, which constrains the interpretability of the selected variables. The RF algorithm, as an ensemble learning method, is adept at handling complex nonlinear relationships and significantly enhancing predictive accuracy, albeit at the expense of substantial computational resource consumption. By integrating multiple feature selection methods, it is possible to more effectively analyze the impact of environmental covariates on soil salinization, thereby increasing the accuracy and reliability of predictive models and providing robust technical support for related research.

2.6. Modeling

To accurately assess the effectiveness of machine learning and deep learning in the field of salt monitoring, this study incorporates the SHAP method into the machine learning model construction process, which significantly enhances the explanatory power of the model, deepens the understanding of the decision-making mechanism of the model, and then enhances the transparency and credibility of the model. Meanwhile, in deepening the development of the deep learning model, the newly launched KAN framework, a deep learning tool with built-in advanced interpretation functions, was adopted, which empowers the researchers to understand better and verify the underlying logic of the model predictions, ensuring the rigor and practical value of the model. With the combined use of multi-source satellite data and advanced machine learning techniques, this study is can able to track and predict soil salinity more effectively, providing solid scientific support for soil quality improvement and crop yield increase.

2.6.1. Machine Learning Models

Machine learning models are widely used to tackle complex data mining challenges due to their powerful ability to learn from data on their own. These algorithms excel at revealing nonlinear data associations and demonstrate a high learning acuity that enables them to tackle various regression and classification tasks effectively. The selection of models for machine learning is based on the following considerations. Two gradient boosting models, LightBoost and XGBoost, are good at handling high-dimensional data and complex nonlinear relationships; the former is known for its efficiency and the latter has an advantage in speed, and the combination of the two aims to improve the prediction accuracy. The RF reduces the risk of overfitting through integrated learning and provides a robust assessment of the importance of features and enhances the model interpretability. ET, on the other hand, was selected for its parallel computing power and advantages in handling high-dimensional data, and was used to compare the performance of different tree models. In this soil salinity prediction task, we specifically chose these four models for training: ET [44], RF, XGBoost [45], and LightBoost [46].

2.6.2. Deep Learning Models

Deep learning neural networks process input data and generate output by emulating the multi-layered neuronal structure of the human brain. This network structure enables the effective processing of complex, high-dimensional data, the mapping of said data to a low-dimensional space through the application of dimensionality reduction techniques, the extraction of key information, and the optimization of decision-making processes. For deep learning, 1D-CNN is able to effectively process sequence data and capture spatial hierarchies, thus extracting key information needed to predict soil salinity. The introduction of residual connectivity enables 1D-ResNet to effectively overcome the gradient vanishing problem of deep network training, so as to construct deeper networks, learn more complex data features, and improve the prediction accuracy and stability. MLP, with its concise and powerful nonlinear modeling capability, can effectively capture the complex relationships of soil salinity data and serve as a baseline model for easy comparison with other deep learning models for performance comparison.

The convolutional layers of 1D-CNN (https://github.com/poloclub/cnn-explainer (accessed on 20 May 2024)) are capable of efficiently processing sequential information and capturing spatial hierarchies. The 1D-ResNet (https://github.com/KaimingHe/deep-residual-networks (accessed on 26 May 2024)) is renowned for its intricate structure and is adept at facilitating deeper learning through the utilization of residual connections. The MLP (https://github.com/filipecalasans/mlp (accessed on 26 May 2024)) is distinguished by its straightforward architectural design and robust nonlinear modeling capabilities, which enable the capture of intricate data relationships. In this study, the aforementioned classic network structures are applied to the soil salinity regression prediction. However, the opaque structure of black-box models hinders the interpretability of deep learning and impairs the transparency of the decision-making process within the model. The issue of improving the interpretability of deep models is also a topic of considerable current research interest.

2.6.3. Interpretable Deep Learning Model-KAN

The recently introduced KAN model, is a revolutionary alternative to the traditional MLP in the field of neural networks, demonstrating a high degree of flexibility in parameter tuning. With regard to parameter training, KAN exhibits a distinct advantage over MLP, which necessitates retraining when parameters are modified. The KAN is a network named after Kolmogorov and Arnold, with its core idea derived from the Kolmogorov-Arnold Representation Theorem (KART). KART states that any multivariate continuous function can be represented as a finite linear combination of univariate functions. In KAN, spline functions are used to replace the weight parameters in traditional neural networks. Spline functions exhibit high flexibility and adjustability, enabling effective fitting of complex data relationships, thereby reducing approximation errors and enhancing the network’s capability to learn subtle patterns from high-dimensional data. The general formula for KAN spline can be represented using B-splines.

S p l i n e (x) = \sum_{i} c_{i} B_{i} (x)

(2)

Here,

S p l i n e (x)

denotes the spline function.

c_{i}

are coefficients optimized during the training process, while

B_{i} (x)

are B-spline basis functions defined on a grid. The grid points define intervals where each basis function

B_{i}

is active and significantly influences the shape and smoothness. During training, the shape of the spline function is adjusted by optimizing the loss function to best fit the training data. The spline parameters are updated at each iteration to reduce prediction errors. KAN, whose open-source implementation can be found at (https://github.com/KindXiaoming/pykan (accessed on 31 May 2024)) enables parameter scaling by introducing a spline function-based network structure without having to going go through the training process again. By introducing simplified functions and transformation steps, we are able to gain a deeper understanding of the intrinsic mechanisms of KAN and provide more precise mathematical descriptions of the learned functions. This feature makes it possible to build deep networks by stacking multiple layers of KAN to cope with complex problems more efficiently, with each layer tailored to the specific goals of the task.

This pioneering strategy not only improves the model’s adaptability to various tasks, but also simplifies the path to increasing the depth of the model, providing a powerful tool for overcoming difficult challenges in machine learning practice. The unique network architecture of KAN is illustrated in Figure 4.

2.7. Data Processing Steps

Firstly, in data preprocessing, the statistical method Interquartile Range (IQR) is applied to this segment because of its extraordinary performance in identifying outliers, which enables effective screening of the dataset to exclude outliers. The dimension lessness of the data has a significant effect on the improvement of the accuracy of the model, so the RobustScaler method was adopted for the data, which subtracts the median and divides it by the interquartile distance, and the data are scaled and standardized, which enabled the effective removal the abnormal values and outliers, while retaining the relative relationship between the data. In the machine learning algorithm, the grid search method was used to obtain the optimal parameter configure ratio, and each model was constructed based on Python’s open-source machine learning library Scikit-Learn. For neural networks, ten-fold cross-validation was selected to ensure the model’s ability to generalize and stability, and based on the performance of the model at each time, key parameters, such as the learning rate, learning batch, and so on, were tuned. The model was also tuned based on each model’s performance for key parameters such as learning rate, learning batch, etc.

2.8. Model Evaluation Method

In the regression model evaluation for salt prediction, we used the coefficient of determination (R², Equation (3)) and root mean square error (RMSE, Equation (4)) to measure the fit of the data and quantify the model error. In the classification of salinity, Accuracy (Equation (5)), Precision (Equation (6)), Recall (Equation (7)), and F1-score (Equation (8)) are utilized to evaluate the performance of distinguishing different salinity areas. In the regression task of predicting soil salinity modeling, R² was chosen to measure the model’s ability to explain the variation in the data, i.e., to reflect the extent to which the model fits the dataset. RMSE, on the other hand, is used to describe the difference between the model’s predicted and actual values, visualizing the magnitude of the error in the model’s predictions. In the classification task, F1-score is a metric that combines precision and recall, and is particularly suitable for dealing with datasets that are not balanced in terms of categories, providing a more comprehensive picture of the overall performance of the model. Accuracy shows the proportion of samples correctly categorized by the model out of the total number of samples, and provides a basic overview of the model’s performance. Precision measures are the proportion of samples predicted to be positive which were actually positive, reflecting the model’s false positive rate, while Recall measures the proportion of all actual positive classes that were correctly predicted to be positive, reflecting the model’s ability to recognize positive samples. These metrics allow the study to fully evaluate and optimize the overall performance of the model.

R^{2} = 1 - \frac{\sum_{1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{1}^{n} {(y_{i} - \bar{y})}^{2}}

(3)

R M S E = \sqrt{\frac{\sum_{1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}}

(4)

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(5)

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

R e c a l l = \frac{T P}{T P + F N}

(7)

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + r e c a l l}

(8)

The variables

y_{i}

,

\bar{y}

, and

{\hat{y}}_{i}

represent the measured, average, and predicted values of the salt content, respectively. The terms True Positives (

T P

), True Negatives (

T N

), False Positives (

F P

), and False Negatives (

F N

) are used to quantify the number of true positive, true negative, false positive, and false negative examples, respectively.

2.9. Flow Chart

The principal process framework of this study is illustrated in Figure 5. The initial stage of the study involved the collection of soil samples and salinity data from the designated study area. Secondly, images from two satellites were procured. Subsequently, the terrain parameters and other characteristic variables were obtained. Ultimately, the characteristic dataset was modeled and the interpretability of the model was enhanced through the use of visualization techniques.

3. Results

3.1. Statistical Distribution of Soil Salinity in the Northern Territory

A classification study of soil salinization in agricultural soils in northern Xinjiang shows that local soil management measures are effective. This conclusion is based on the analysis of 1044 sampling sites, most of which had less than mild soil salinity, and only a small proportion of which had moderately saline, severely saline, and extremely saline soils. Table 3 details the changes in soil salinity and other attributes, synthesizing the overall changes in salinity across the study area. Factors contributing to the success of these measures include improved irrigation management, optimized fertilizer application, and targeted drainage efforts, all of which mitigate excessive salt accumulation. The dataset supporting these findings includes a total of 1044 samples. The statistical analysis of the salinity measurements indicates that the mean value is 1.93, with a standard deviation of 1.67. This suggests that the data points are highly dispersed around the mean. The standard error is 1.07, indicating that the sample-based mean value estimate of salinity is subject to a high degree of uncertainty. The data exhibit a clear right skewness (positive skewness), as indicated by a skewness value of 1.76, which suggests that larger salinity values are more prevalent within the dataset. The kurtosis of 3.71 indicates that the distribution is more peaked than normal, indicating the presence of more extreme values. Furthermore, the coefficient of variation of 0.87 indicates that the degree of variation in the salt content data is relatively high. Table 4 illustrates the proportion of soil samples exhibiting varying degrees of salinization. The salinity data from a total of 1044 sample points were classified into five levels according to the salinity concentration: non-saline, slightly saline, moderately saline, highly saline, and extremely saline. The results demonstrate that the majority of samples (65.7%) are classified as non-saline, indicating that plots with low salinity levels are predominant within the observation area. In total, 22.2% of the samples exhibited a slight degree of salinity, suggesting that the salinity level in some areas may have experienced a slight increase. The 8.9% of samples with a moderate degree of salinity and the 2.9% and 0.3% of samples with a severe and extreme degree of salinity, respectively, suggest that the salinity level in some areas has increased slightly. We compared the statistical distribution of soil salinity in cotton fields in northern Xinjiang with data from other studies [47,48]. Through this comparative analysis, the current study revealed a reduction in the degree of soil salinization within farmland areas. This alleviation is closely related to factors such as climate change and modifications in irrigation management practices. The soil salinity concentrations observed in this study are generally lower than those reported in historical data from Xinjiang, indicating that recent implementations of effective irrigation and drainage measures has mitigated soil salinization issues. However, ongoing monitoring remains essential to prevent further exacerbation of soil salinization and to facilitate the timely development of management strategies.

3.2. Correlations Between Soil Salinity and Spectral and Environmental Variables

By analyzing the correlation between spectral data and soil salinity, a total of 99 characteristic variables were selected for modeling through the combination of multi-source satellite data and environmental variables. Specific information is presented in Table 5. By means of using a dimensionless processing of the aforementioned characteristic variables, Pearson correlation coefficients were calculated between them and the target variable to assess the degree of correlation. By analyzing the correlation coefficients between SSC and each characteristic variable in Figure 6, five main variables that are positively correlated with SSC were identified. The variables identified as being significantly correlated with SSC were Landsat 8 bands 17, 12, and 11, temperature, and the EEVI index. Of these, Landsat 8 band 17 exhibited the most substantial positive correlation coefficient (0.39). In contrast, variables exhibiting a significant negative correlation with SSC included Landsat 10 bands, compound fertilizer, potassium fertilizer, nitrogen fertilizer, and the DEM. Landsat 8 band 10 demonstrated the strongest negative correlation (−0.41). These findings not only validate the efficacy of the comprehensive characteristic variable approach in identifying the pivotal factors influencing soil salinity alterations but also furnish crucial data for further comprehensive investigations into the mechanisms underlying soil salinization.

3.3. Satellite Data to Monitor Soil Salinity Results

In this study, a 3:1 ratio was employed to divide the training set and the test set, thereby ensuring the generalization ability of the evaluation model. The results presented in Table 6 are based on the performance evaluation of the test set. Prior to feature engineering, the data underwent normalization to eliminate the influence of dimension and enhance the efficiency of model training. Of the four feature datasets, the RF_average dataset demonstrated the most optimal performance. With regard to the specific model performance of this dataset, ResNet achieved the highest prediction accuracy of 0.54, followed by 1D-CNN with a result of 0.52, and KAN with a prediction result of up to 0.49. The most optimal performance in machine learning was demonstrated by extreme random trees, with an accuracy of 0.43, followed by the random forest, XGBoost, and LightBoost, which were calibrated using grid search parameters. Chen et al.’s [49] study pointed out that monitoring the spatial distribution of soil salinity by remote sensing elicited the problem of insufficient accuracy in soil salinity monitoring due to the mismatch between ground point observations and remote sensing pixel scales. Jia’s [50] study compared UAV and satellites, and found that the satellite monitoring performed poorly, with the R² values for the modeling set and the validation set being 0.42 and 0.32, respectively. Wang et al. [51] collected a total of 211 samples from Iran and Xinjiang, respectively, as independent validation datasets, and found that the salinity accuracy varied between 0.21 and 0.87 for satellite monitoring of arid and non-arid areas. Liu’s [52] study showed that the R² of drone-scale monitoring accuracy was 0.89, which was higher than that of the original satellite model (0.63), and that the correction of the satellite by drone data improved the prediction accuracy to 0.79. Nabiollahi [53] monitored soil salinity in the Iranian region using the ten-fold cross-validation method on 295 data with R-squared in the range of 0.54–0.67. Ivushkin [54] predicted salinity in agricultural fields using drones and found that the combination of data between sensors improved the prediction ability, reaching an R² of 0.46 for the whole dataset, and an R² of 0.64 for some subgroups. These are indicative of regional variations as well as different sampling samples, and oscillations in the ability of the satellites themselves to monitor salinity. Although the predictive power is limited, it can still provide valuable information to guide soil management in specific situations.

Table 5. Specific information on the characteristic variables.

Data Source	Feature Name	Count	Subtotal
Sentinel-2	Band1-12	12	35
	S₁–S₆(S2_S₁/S2_S₂/S2_S₃/S2_S₅/S2_S₆)	5
	SI(S2_SI/S2_SI₁/S2_SI₂/S2_SI₃)	4
	S2_Int(1-2)	2
	S2_TBI(1-3)	3
	NDSI/NDVI/ENDV/EVI/EEVI/CRSI/CORSI/GDVI/SDI	9
Landsat 8	Band1-18	18	41
	S₁–S₆(L8_S₁/L8_S₂/L8_S₃/L8_S₅/L8_S₆)	5
	SI(L8_SI/L8_SI₁/L8_SI₂/L8_SI₃)	4
	L8_Int(1-2)	2
	L8_TBI(1-3)	3
	NDSI/NDVI/ENDV/EVI/EEVI/CRSI/CORSI/GDVI/SDI	9
Other Environmental Variables	DEM/Slope/Slope direction/Topographic relief/Curvature/Mountain Shading/Terrain Roughness/Soil Type/Soil parent material/Clay Soil /Sandy Soil/Pink Sandy Soil/Population Distribution/Nighttime Lighting/Temperature/Rainfall/N/P/K/Compound fertilizer/Agricultural film/Railroad Density/Highway Density	23	23
			Total 99

For the lower R² values in the regression task, the following reasons were analyzed. In terms of environmental factors, due to the large amount of data and the wide span of areas covered, water resource status, topographic relief, and fertilizer application varied greatly among regions in the northern border. Our study shows that the effects of nitrogen and compound fertilizers on soil salinity are more significant than other factors and, thus, the differences in fertilizer application levels among different regions have a great impact on the inferential ability of the model. Secondly, in terms of time, the data collection was concentrated in the whole month of April, which led to some changes in environmental characteristics throughout the sampling period. In addition, the uncertainty of meteorological factors, such as rainfall and temperature, can also have some impact on the observation of soil salinity, which makes the regression model very difficult. In the northern region of Xinjiang, there are significant differences in climatic conditions in different areas, and the absence of some environment-specific samples leads to a decrease in the simulation accuracy of the model in the dataset. Finally, the model has limitations. For machine learning models, although they can handle complex nonlinear relationships, their dependence on the feature set is high. In deep learning, on the other hand, due to the limited number of bands provided by satellite imagery, environmental factors, such as soil moisture, water table, vegetation cover, etc., cannot be adequately accessed through the available data sources. The obtained data of environmental variables are not sensitive enough to changes in sampling points to provide the exact attributes of each sample. As a result, the model did not include learning features of sufficient depth, limiting the model’s generalization ability and prediction accuracy.

The SHAP method was used in the study to visualize the contribution of the variables to the model. Figure 7 depicts the SHAP variable contribution graph of the RF_average dataset, which exhibited the most optimal performance among the four feature selection methods. KAN has a smooth activation behavior due to its use of the SiLU activation function. Unlike traditional activation functions, such as ReLU, this smoothing nonlinearity leads to a more continuous response of the model to different features, thus demonstrating different patterns of feature contributions in the SHAP plot. As illustrated in Figure 7, among all the deep learning models, the combination of compound fertilizer usage, Landsat 8 band 17, and Sentinel-2 band 12 contributes most significantly. Among environmental factors, the impacts of compound and nitrogen fertilizers are prominent, with the salinity index from Landsat 8 outperforming that from Sentinel-2. These findings demonstrate that the application of chemical fertilizers significantly influences the spatial distribution of soil salinity. Additionally, variations in multispectral satellite data reflect differences in observational technologies among sensors. In machine learning models, the S₁ salinity index performs the best, followed by the impact of environmental factors and the original bands from Landsat 8.

3.4. Results of Satellite Data Monitoring the Degree of Soil Salinization

In the modeling study of soil salinity monitoring, the ResNet model demonstrated efficacy in processing two-dimensional data due to its inverted residual structure, while the KAN model exhibited superior performance and interpretability in processing mathematical and scientific proofs. In the domain of machine learning, the ET and RF models demonstrated remarkable efficacy in the salinization monitoring task. As illustrated in Table 7, among the four feature variable datasets, the SPA feature dataset exhibited superior performance, with KAN exhibiting the highest classification accuracy of 0.75, while ResNet attained an accuracy of 0.71, and RF and extreme random trees exhibited comparable performance.

To assess the risk of model predictions, we used the log-loss indicator to evaluate the model by measuring the difference between the predicted distribution and the actual value distribution. A lower log-loss indicates a lower risk and, thus, more stable predictions. Visualizing this risk indicator further ensures confidence in the model’s predictive process.

Log L o s s = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{\dot{j} = 1}^{M} y_{i j} \log (p_{i j})

(9)

where

N

denotes the number of samples,

M

denotes the number of categories,

y_{i j}

is an indicator variable for the actual classification result, and

p_{i j}

is the predicted probability that a sample belongs to a category.

In the log-loss visualization in Figure 8, the loss values of the five models are in the range of 0 to 0.46, indicating that for the task of salinity degree prediction, the prediction probabilities of all the models are very close to the actual labels, thus reflecting the accuracy and reliability of these models on this task. And the stability of traditional machine learning is better than that of deep learning models, in which KAN has the lowest prediction uncertainty in deep learning, and RF has the highest prediction probability in machine learning.

3.5. Analysis and Mapping of Soil Salinization in Arid Farmlands of Northern Xinjiang

The results of the saline mapping analysis provide a foundation for the formulation of soil management strategies and the implementation of preventive measures, which will facilitate the timely implementation of corresponding control measures. In farmland areas with manual management interventions, the results depicted in Figure 9 indicate that the SSC of the first type of soil ranges from 0 to 1.03 g · kg⁻¹, indicating non-saline conditions. The second and third types are non-saline soil to saline soil, with the latter distributed mainly in the eastern part of the study area. The SSC values of the fourth type of soil range from 2.11 g · kg⁻¹ to 10.82 g · kg⁻¹, indicating saline soil. These areas are distributed in the western part of the study area and require attention and management. The degree of soil salinization in the central part of the study area is relatively low; however, some of it is still in the process of transitioning to saline soil. Furthermore, the degree of soil salinization in the eastern part of the study area also requires attention, as a slight tendency towards salinization has already been demonstrated.

4. Discussion

4.1. Application of Sentinel-2 and Landsat8 Remote Sensing Data to Monitor Soil Salinization in the Farmland of Northern Xinjiang

The prediction and monitoring of soil salinization provides crucial decision-making support in the management of salinization in the farmland areas of northern Xinjiang. Moreover, numerous studies have demonstrated that the integrated utilization of multi-source remote sensing data can enhance the extraction of subtle information, while the complementary performance of the data can elevate the spatial and temporal resolution of remote sensing images [55,56]. In this case, the salinity index performed better compared to other spectral indices, with the S₁ salinity index having the greatest impact [57]. For soil salinity attributes, unlike soil surface indices, satellite-based Earth observation technology becomes a powerful detection tool. Through a series of causal analyses, it is not difficult to find that soil salinity content and topographic features are significantly affected by high temperatures and low rainfall in the Northern Borderland due to its arid and semi-arid climatic characteristics. However, in the areas of human-cultivated farmland, farmers take irrigation or artificial rain enhancement measures under unfavorable natural conditions, such as extreme dryness or scorching heat, in order to alleviate the problems of soil crusting and salinization, and to safeguard the normal growth of crops [51]. The contribution of compound fertilizer and nitrogen fertilizer was particularly significant in the salinity modeling process, emphasizing the importance of fertility as a key environmental factor in predicting soil salinity, and the findings are in line with Omuto et al.’s findings that environmental factors are influential in revealing soil properties [58]. Changes in compound fertilizer application are also increasingly affecting the inherent perfect functioning of soils and influencing soil quality [39]. Given the vastness of the study area, topographic variability, and geographical differences in rainfall, temperature, and fertilizer application methods, model predictions face a high degree of noise interference, with current prediction accuracies reaching a maximum of 0.54. This reflects the fact that, despite the challenges, monitoring activities continue to be of great practical value in guiding regional salinity management strategies.

4.2. The Influence of Characteristic Variable Selection on Monitoring Soil Salinization in the Farmland of Northern Xinjiang

In light of the extensive geographical scope and distribution of soil regions, coupled with the fact that soil salinity is a soil property that is not directly discernible at the surface, it becomes evident that the introduction of additional auxiliary variables or information is necessary to establish a correlation with the spectral index of remote sensing images, with a view to enhancing the accuracy and reliability of prediction. By combining spectral indices to construct vegetation indices, salinity indices, and so forth, the information present in spectral data can be optimized through the use of mathematical formulas. Concurrently, the terrain index derived from the digital elevation model, in conjunction with the brightness, greenness, and humidity indices of remote sensing images, can be employed to effectively reflect the surface characteristics and potential soil properties by utilizing the texture characteristics of the images. In this study, by adopting the strategy of multi-model and multi-feature fusion, the applicability range and prediction accuracy of the model were improved, providing solid theoretical and methodological support for the effective monitoring of soil salinization phenomena. These integrated indices comprehensively consider soil properties, topographic features, vegetation information and transformation features, and comprehensively analyze the detection and monitoring of soil properties in the shallow topsoil layer from multiple perspectives, providing an important analytical framework for an in-depth understanding of the soil situation.

4.3. The Great Potential Shown by Neural Network Techniques in Soil Salinity Monitoring in Cotton Growing Areas in the Northern Borde

Neural network technology has been recognized as a powerful tool for processing data due to its ability to efficiently extract complex multi-dimensional information, and has demonstrated exceptional performance in the areas of pattern recognition, image analysis and language understanding, especially when faced with the challenges of nonlinear and high-dimensional data [59,60]. Unlike traditional machine learning methods [61], deep learning, through its complex architectural design, is able to dig deeper into the complex associations and interdependencies between features and targets [62], showing a clear advantage in revealing implicit connections in data. For highly complex modeling problems, such as salt prediction, neural network models often outperform traditional machine learning models, mainly due to their ability to learn and refine the esoteric and abstract qualities of the data through multilevel nonlinear structures to achieve more accurate predictions. This feature makes neural networks particularly important in analyzing environmental data with complex interactions, and ResNet, with its innovative design of residual connectivity, effectively mitigates the problem of vanishing and exploding gradients, and demonstrates strong noise immunity, which is why it performs well in the task of soil attribute regression prediction. Large-scale soil salinity monitoring faces significant challenges due to its extensive spatial coverage and the pervasive high noise levels in remotely sensed data. Deep neural networks, with their powerful feature extraction capabilities, offer the potential for overcoming these challenges. However, training deep neural networks typically requires massive datasets to mitigate noise and accurately capture the complex relationships governing soil salinity variability. With limited data, simplified network architectures, such as MLP or KAN, with their fewer layers, effectively reduce overfitting risk and computational cost. Therefore, augmenting soil salinity datasets with rich auxiliary information, such as environmental variables or remotely sensed indices, is crucial for improving the robustness and reliability of both shallow and deep neural network models in soil salinity monitoring. KAN, as an emerging neural network model, is intended to innovate the traditional multilayer perceptron (MLP), and has shown excellent results in theoretical validation and mathematical proof of principle, especially in the application case of monitoring soil salinity, which not only exceeds the conventional neural network in terms of accuracy, but also incorporates explanatory enhancement features. Therefore, the study suggests that the adoption of the KAN model is an efficient strategy for monitoring soil salinity processes through satellite technology, and that the model plays a key role in predicting soil salinity trends due to its high level of accuracy and robust data-handling capabilities. The overall assessment showed that the neural network technology demonstrated its unique and effective function in monitoring soil salinity and salinity severity.

Figure 10 illustrates the distribution of the KAN layer spline weights in each channel, where the horizontal axis represents the number of features in the dataset, and the difference in color shades (transitioning from blue to yellow) intuitively reflects the rise and fall of weight values, with a blue tendency indicating smaller weights and a yellow tendency implying higher weight values. This color coding mechanism facilitates the observer to quickly identify the key areas of weighting in each channel, which in turn provides insight into the characteristics and patterns of input feature transitions across channels.

4.4. Analysis and Solutions for Soil Salinity Prediction: Regression Classification Insights

Through regression prediction analysis of soil salinity, the complementary nature of multi-source satellite data is evident. Among the four deep learning models analyzed, the original bands of Landsat 8 and Sentinel-2 each demonstrated their strengths. Specifically, band 17 of Landsat 8 showed outstanding performance in CNN and KAN models, while bands 12 and 1 of Sentinel-2 were significant contributors in MLP and ResNet models. Furthermore, in all machine learning models, the salinity index S₁ of Landsat 8 was the most prominent feature. Beyond original bands, the most critical features in deep learning models included nitrogen fertilizer, compound fertilizer, and temperature. These findings underscore the significant impact of fertilizers on soil health, particularly the water-soluble salts in nitrogen fertilizer, whose improper use can lead to soil salinity accumulation, thereby damaging soil structure and function. In arid regions, high temperatures exacerbate soil moisture evaporation, leading to the retention of salts on the surface layer, accelerating soil salinization. Based on the gradual increase in soil salinization in farmland in northern Xinjiang, the key to reducing the risk of salinization is to optimize fertilizer application strategies, rationally apply irrigation techniques, and use remote sensing for real-time decision support. Optimizing the fertilization strategy involves the scientific application of organic and chemical fertilizers to avoid over-fertilization, leading to salt accumulation. The rational use of irrigation technology requires precise irrigation according to soil and crop water requirements to reduce salt accumulation in the soil. Remote sensing technology is used to monitor salinity and provide data to support decision-making. In summary, comprehensive measures can effectively reduce the risk of soil salinization in farmland in northern Xinjiang and ensure sustainable agricultural development.

4.5. Satellite Remote Sensing for Monitoring Soil Salinity Enhances Soil Salinization Understanding

Due to the special geographic location and climatic environment in northern Xinjiang, the degree of salinization has been significantly improved under long-term farming management. Adopting reasonable irrigation measures can achieve effective regulation of salinity in farmland, optimize the existing irrigation system to achieve sustainable and healthy development of the soil, and then increase grain yield. On this basis, according to the distribution status of salts in farmland, a reasonable fertilization strategy is proposed to reduce the application of chemical fertilizers and maintain farmland ecological environments. This not only ensures the grain yield, but also improves the quality of grain. For high-salinity areas, the goal of developing salt-tolerant crops is proposed, and the crop rotation mode is adopted in low-salinity areas, so as to achieve the purpose of repairing soil health and realizing the sustainable development of agriculture through the alternation of cash crops and salt-tolerant crops. Due to irrational fertilization and irrigation, the depth of groundwater burial and the accumulation of salt in the water body can cause excessive soil salinity. If it is not drained in time, the accumulation of salt in the soil will become more serious and rise along with the evaporation of water. Soil secondary salinization is an important cause of the decline in soil permeability and deterioration in nature and function, which seriously affects agricultural production. In order to ensure the health of the land and realize the sustainable development of agriculture, it is necessary to take corresponding technical measures. The use of salinity distribution maps can effectively alert agricultural growers to prevent secondary salinization.

4.6. Summary and Outlook for Multi-Source Satellite Remote Sensing

This study utilizes Sentinel-2 and Landsat 8 satellite data to monitor and analyze soil salinization. The complementary strengths of these two data sources provide high-quality surface information. Sentinel-2’s high spatial resolution (10–20 m) and high temporal resolution (5-day revisit) are suitable for small-scale and high-precision monitoring, and its multispectral bands effectively capture spectral characteristics of surface salinity; however, it is susceptible to cloud cover and complex terrain, and generates large datasets. Landsat 8’s moderate spatial resolution (30 m) and 16-day revisit cycle are suitable for large-scale monitoring, and its long-term data archive facilitates time-series analysis and trend studies; however, its lower temporal resolution limits real-time monitoring of rapidly changing areas and hinders the detection of subtle salinization features in small regions. This study integrates Sentinel-2 and Landsat 8 data to achieve high-precision, large-scale soil salinization monitoring. Future applications of multi-source remote sensing data fusion techniques, and advanced algorithms, such as deep learning and spectral unmixing, will further enhance the spatiotemporal resolution, accuracy, and comprehensive analytical capabilities of monitoring, driving advancements in soil salinization monitoring technology. The combined application of Sentinel-2 and Landsat 8 data, coupled with advanced machine learning and deep learning algorithms, and evolving remote sensing technologies and multi-source data fusion methods, will significantly improve the accuracy and efficiency of future soil salinization monitoring, enabling more comprehensive and precise salinization monitoring and management.

5. Conclusions

This study integrates various remote sensing data resources, including three-dimensional spectroscopy, terrain features, and vegetation coverage information, and adopts four different attribute selection strategies to build predictive models. Among the collected soil samples, non-salinized soil accounts for 65.7%, while the remaining 34.3% is mainly slightly saline soil, with a negligible proportion of extremely saline soil at only 0.3%. The modeling techniques cover the field of deep neural networks with ResNet and classic machine learning algorithms such as KAN. Through a systematic comparison of different data combinations and model performance, the analysis reveals that the composite fertilizer and nitrogen fertilizer characteristics variables are important features, highlighting the inherent relationship between environmental factors and soil salinity content. In terms of model performance, ResNet demonstrates the highest accuracy of 0.54 in quantitatively predicting soil salinity in the cotton planting area of northern Xinjiang, while the KAN model shows significant effectiveness in saline classification tasks with an accuracy rate of up to 0.75. Furthermore, this study maps the distribution of soil salinity in Northern Xinjiang, clearly indicating saline-heavy areas and low-salinity safe zones. The western region exhibits saline characteristics, while the central and eastern regions maintain a benign low-salinity state. These research findings provide decision support for improving soil environments and formulating rational crop layout strategies.

Author Contributions

Conceptualization, P.G., M.Z. and X.F.; methodology, M.Z. and X.H.; software, M.Z.; validation, L.G., X.G., J.P. and F.T.; formal analysis, L.G.; investigation, M.Z.; resources, X.F.; data curation, L.G. and X.H.; writing—original draft preparation, M.Z.; writing—review and editing, P.G.; visualization, X.G. and J.P.; supervision, P.G.; project administration, P.G. and F.T.; funding acquisition, P.G. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Corps’ Youth Science and Technology Innovation Talent Program, Project No. 2023CB008-16, and the National Natural Science Foundation of China, Project No. 62265015.

Data Availability Statement

Contact the authors for access to relevant research data and code.

Acknowledgments

We thank the editors and the reviewers for their useful feedback that improved this paper.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Hassani, A.; Smith, P.; Shokri, N. Negative correlation between soil salinity and soil organic carbon variability. Proc. Natl. Acad. Sci. USA 2024, 121, e2317332121. [Google Scholar] [CrossRef] [PubMed]
Shoot Zhezati, A.; Zhang, T. How Crops Grow Competitively on Saline and Alkaline Land—A Series of Reports on the Management of Saline and Alkaline Land in Xinjiang (1). Available online: http://www.egi.cas.cn/xwdt/mtsm/202311/t20231115_6933482.html (accessed on 14 November 2023).
Nuer Shawule, H. Causes of Soil Salinization in Xinjiang and Its Prevention Countermeasures. Sci. Technol. Innov. 2020, 52–53. [Google Scholar] [CrossRef]
Tan, W.; Liu, Y.; Dong, J.; Yang, Y.; Huang, J. Inversion of Soil Water-Soluble Salt Ion Content in Saline-Alkali Soil Based on Sentinel-2 Satellite Imagery and Soil Variables. China Rural Water Hydropower 2024, 228, 210–217. [Google Scholar]
Bandak, S.; Movahedi-Naeini, S.A.; Mehri, S.; Lotfata, A. A longitudinal analysis of soil salinity changes using remotely sensed imageries. Sci. Rep. 2024, 14, 10383. [Google Scholar] [CrossRef]
Wang, J.; Wang, J.; Chen, S.; Luo, J.; Sun, M.; Sun, J.; Yuan, J.; Guo, J. Study on the Variations in Water Storage in Lake Qinghai Based on Multi-Source Satellite Data. Remote Sens. 2023, 15, 1746. [Google Scholar] [CrossRef]
Mahanta, A.R.; Rawat, K.S.; Kumar, N.; Szabo, S.; Srivastava, P.K.; Singh, S.K. Assessment of multi-source satellite products using hydrological modelling approach. Phys. Chem. Earth Parts A/B/C 2024, 133, 103507. [Google Scholar] [CrossRef]
Wang, D.; Chen, H.; Wang, Z.; Ma, Y. Inversion of soil salinity according to different salinization grades using multi-source remote sensing. Geocarto Int. 2022, 37, 1274–1293. [Google Scholar] [CrossRef]
Zhang, Y.; Li, G.; Li, H.; Chen, C.; Shao, W.; Zhou, Y.; Wang, D. A Spatiotemporal Comparison and Assessment of Multisource Satellite Derived Sea Ice Thickness in the Arctic Thinner Ice Region. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 8710–8723. [Google Scholar] [CrossRef]
El-Rawy, M.; Sayed, S.Y.; AbdelRahman, M.A.E.; Makhloof, A.; Al-Arifi, N.; Abd-Ellah, M.K. Assessing and segmenting salt-affected soils using in-situ EC measurements, remote sensing, and a modified deep learning MU-NET convolutional neural network. Ecol. Inform. 2024, 81, 102652. [Google Scholar] [CrossRef]
Mukhamediev, R.I.; Merembayev, T.; Kuchin, Y.; Malakhov, D.; Zaitseva, E.; Levashenko, V.; Popova, Y.; Symagulov, A.; Sagatdinova, G.; Amirgaliyev, Y. Soil Salinity Estimation for South Kazakhstan Based on SAR Sentinel-1 and Landsat-8,9 OLI Data with Machine Learning Models. Remote Sens. 2023, 15, 4269. [Google Scholar] [CrossRef]
Salem, O.H.; Jia, Z. Evaluation of Different Soil Salinity Indices Using Remote Sensing Techniques in Siwa Oasis, Egypt. Agronomy 2024, 14, 723. [Google Scholar] [CrossRef]
Golestani, M.; Mosleh Ghahfarokhi, Z.; Esfandiarpour-Boroujeni, I.; Shirani, H. Evaluating the spatiotemporal variations of soil salinity in Sirjan Playa, Iran using Sentinel-2A and Landsat-8 OLI imagery. CATENA 2023, 231, 107375. [Google Scholar] [CrossRef]
Shi, H.; Hellwich, O.; Luo, G.; Chen, C.; He, H.; Ochege, F.U.; Voorde, T.V.d.; Kurban, A.; Maeyer, P.d. A Global Meta-Analysis of Soil Salinity Prediction Integrating Satellite Remote Sensing, Soil Sampling, and Machine Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Duan, M.; Song, X.; Li, Z.; Zhang, X.; Ding, X.; Cui, D. Identifying soil groups and selecting a high-accuracy classification method based on multi-textural features with optimal window sizes using remote sensing images. Ecol. Inform. 2024, 81, 102563. [Google Scholar] [CrossRef]
Jia, P.; He, W.; Hu, Y.; Liang, Y.; Liang, Y.; Xue, L.; Zamanian, K.; Zhao, X. Inversion of coastal cultivated soil salt content based on multi-source spectra and environmental variables. Soil Tillage Res. 2024, 241, 106124. [Google Scholar] [CrossRef]
Zhao, S.; Ayoubi, S.; Mousavi, S.R.; Mireei, S.A.; Shahpouri, F.; Wu, S.-X.; Chen, C.-B.; Zhao, Z.-Y.; Tian, C.-Y. Integrating proximal soil sensing data and environmental variables to enhance the prediction accuracy for soil salinity and sodicity in a region of Xinjiang Province, China. J. Environ. Manag. 2024, 364, 121311. [Google Scholar] [CrossRef]
Emami, M.; Khormali, F.; Pahlavan-Rad, M.R.; Ebrahimi, S. Digital modeling of surface and subsurface soil salinity in Golestan Province, Iran. Geoderma Reg. 2024, 37, e00800. [Google Scholar] [CrossRef]
Salcedo, F.P.; Cutillas, P.P.; Cabañero, J.J.A.; Vivaldi, A.G. Use of remote sensing to evaluate the effects of environmental factors on soil salinity in a semi-arid area. Sci. Total Environ. 2022, 815, 152524. [Google Scholar] [CrossRef]
Ge, X.; Ding, J.; Teng, D.; Wang, J.; Huo, T.; Jin, X.; Wang, J.; He, B.; Han, L. Updated soil salinity with fine spatial resolution and high accuracy: The synergy of Sentinel-2 MSI, environmental covariates and hybrid machine learning approaches. CATENA 2022, 212, 106054. [Google Scholar] [CrossRef]
He, Y.; Yin, H.; Chen, Y.; Xiang, R.; Zhang, Z.; Chen, H. Soil Salinity Estimation Based on Sentinel-1/2 Texture Features and Machine Learning. IEEE Sens. J. 2024, 24, 15302–15310. [Google Scholar] [CrossRef]
Wang, N.; Peng, J.; Xue, J.; Zhang, X.; Huang, J.; Biswas, A.; He, Y.; Shi, Z. A framework for determining the total salt content of soil profiles using time-series Sentinel-2 images and a random forest-temporal convolution network. Geoderma 2022, 409, 115656. [Google Scholar] [CrossRef]
Li, X.; Li, Y.; Wang, B.; Sun, Y.; Cui, G.; Liang, Z. Analysis of spatial-temporal variation of the saline-sodic soil in the west of Jilin Province from 1989 to 2019 and influencing factors. CATENA 2022, 217, 106492. [Google Scholar] [CrossRef]
Aksoy, S.; Sertel, E.; Roscher, R.; Tanik, A.; Hamzehpour, N. Assessment of soil salinity using explainable machine learning methods and Landsat 8 images. Int. J. Appl. Earth Obs. Geoinf. 2024, 130, 103879. [Google Scholar] [CrossRef]
Cui, X.; Han, W.; Zhang, H.; Dong, Y.; Ma, W.; Zhai, X.; Zhang, L.; Li, G. Estimating and mapping the dynamics of soil salinity under different crop types using Sentinel-2 satellite imagery. Geoderma 2023, 440, 116738. [Google Scholar] [CrossRef]
Akca, S.; Gungor, O. Semantic segmentation of soil salinity using in-situ EC measurements and deep learning based U-NET architecture. CATENA 2022, 218, 106529. [Google Scholar] [CrossRef]
Mohammadifar, A.; Gholami, H.; Golzari, S.; Collins, A.L. Spatial modelling of soil salinity: Deep or shallow learning models? Environ. Sci. Pollut. Res. 2021, 28, 39432–39450. [Google Scholar] [CrossRef]
Lu, R. Soil Agro-Chemical Analyses; China Agricultural Scientech Press: Beijing, China, 2000; pp. 106–253. [Google Scholar]
Brady, N.C.; Weil, R.R.; Weil, R.R. The Nature and Properties of Soils; Prentice Hall: Upper Saddle River, NJ, USA, 2008; Volume 13. [Google Scholar]
Yan, G.; Tang, G.; Chen, J.; Li, F.; Yang, X.; Xiong, L.; Lu, D. Modeling computer sight based on DEM data to detect terrain breaks caused by gully erosion on the loess Plateau. CATENA 2024, 237, 107837. [Google Scholar] [CrossRef]
Choudhury, T. An Overview of Geomorphological Mapping: A Case Study of Rimbi Chhu River Basin, Sikkim, India. J. Geogr. Environ. Earth Sci. Int. 2024, 28, 65–84. [Google Scholar] [CrossRef]
Ioniță, A.; Lungu, M.; Baleț, M.; Todor, A. The geomorphometric atlas of Romania: An open-access database on landform classifications and morphometric variables. J. Maps 2024, 20, 2354712. [Google Scholar] [CrossRef]
Silleos, N.G.; Alexandridis, T.K.; Gitas, I.Z.; Perakis, K. Vegetation indices: Advances made in biomass estimation and vegetation monitoring in the last 30 years. Geocarto Int. 2006, 21, 21–28. [Google Scholar] [CrossRef]
Wu, D.; Jia, K.; Zhang, X.; Zhang, J.; Abd El-Hamid, H.T. Remote Sensing Inversion for Simulation of Soil Salinization Based on Hyperspectral Data and Ground Analysis in Yinchuan, China. Nat. Resour. Res. 2021, 30, 4641–4656. [Google Scholar] [CrossRef]
Jarchow, C.J.; Didan, K.; Barreto-Muñoz, A.; Nagler, P.L.; Glenn, E.P. Application and Comparison of the MODIS-Derived Enhanced Vegetation Index to VIIRS, Landsat 5 TM and Landsat 8 OLI Platforms: A Case Study in the Arid Colorado River Delta, Mexico. Sensors 2018, 18, 1546. [Google Scholar] [CrossRef] [PubMed]
Fernández-Buces, N.; Siebe, C.; Cram, S.; Palacio, J.L. Mapping soil salinity using a combined spectral response index for bare soil and vegetation: A case study in the former lake Texcoco, Mexico. J. Arid Environ. 2006, 65, 644–667. [Google Scholar] [CrossRef]
Wu, W. The Generalized Difference Vegetation Index (GDVI) for Dryland Characterization. Remote Sens. 2014, 6, 1211–1233. [Google Scholar] [CrossRef]
Yungang, L.I.; Jiaonan, H.E.; Xue, L.I. Hydrological and meteorological droughts in the Red River Basin of Yunnan Province based on SPEI and SDI Indices. Prog. Geogr. 2016, 35, 758–767. [Google Scholar] [CrossRef]
Lin, S.; Wang, Q.; Deng, M.; Wei, K.; Sun, Y.; Tao, W. The mechanism of using magnetized-ionized water in combination with organic fertilizer to enhance soil health and cotton yield. Sci. Total Environ. 2024, 941, 173781. [Google Scholar] [CrossRef]
Pontes, M.J.C.; Galvão, R.K.H.; Araújo, M.C.U.; Moreira, P.N.T.; Neto, O.D.P.; José, G.E.; Saldanha, T.C.B. The successive projections algorithm for spectral variable selection in classification problems. Chemom. Intell. Lab. Syst. 2005, 78, 11–18. [Google Scholar] [CrossRef]
Li, H.; Liang, Y.; Xu, Q.; Cao, D. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 2009, 648, 77–84. [Google Scholar] [CrossRef]
Centner, V.; Massart, D.-L.; de Noord, O.E.; de Jong, S.; Vandeginste, B.M.; Sterna, C. Elimination of Uninformative Variables for Multivariate Calibration. Anal. Chem. 1996, 68, 3851–3858. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
Meng, Q. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Chen, W.; Yang, H.; Zhang, F.; Yang, X. Spatial and temporal variability of soil salinity in cotton field in Shihutan irrigation area in the northern Xinjiang. Cotton Sci. 2022, 34, 546–555. [Google Scholar] [CrossRef]
Zhuang, Q.; Wu, S.; Yang, Y.; Niu, Y.; Yan, Y. Spatiotemporal characteristics of different degrees of salinized cultivated land in Xinjiang in recent ten years. J. Univ. Chin. Acad. Sci. 2021, 38, 341–349. [Google Scholar] [CrossRef]
Chen, H.; Wu, J.; Xu, C. Monitoring Soil Salinity Classes through Remote Sensing-Based Ensemble Learning Concept: Considering Scale Effects. Remote Sens. 2024, 16, 642. [Google Scholar] [CrossRef]
Jia, J.; Chen, C.; Liu, Q.; Ding, B.; Ren, Z.; Jia, Y.; Bai, X.; Du, R.; Chen, Q.; Wang, S.; et al. Soil salinity monitoring model based on the synergistic construction of ground-UAV-satellite data. Soil Use Manag. 2024, 40, e12980. [Google Scholar] [CrossRef]
Wang, N.; Chen, S.; Huang, J.; Frappart, F.; Taghizadeh, R.; Zhang, X.; Wigneron, J.-P.; Xue, J.; Xiao, Y.; Peng, J.; et al. Global Soil Salinity Estimation at 10 m Using Multi-Source Remote Sensing. J. Remote Sens. 2024, 4, 0130. [Google Scholar] [CrossRef]
Liu, R.; Jia, K.; Li, H.; Zhang, J. Using Unmanned Aerial Vehicle Data to Improve Satellite Inversion: A Study on Soil Salinity. Land 2024, 13, 1438. [Google Scholar] [CrossRef]
Nabiollahi, K.; Taghizadeh-Mehrjardi, R.; Shahabi, A.; Heung, B.; Amirian-Chakan, A.; Davari, M.; Scholten, T. Assessing agricultural salt-affected land using digital soil mapping and hybridized random forests. Geoderma 2021, 385, 114858. [Google Scholar] [CrossRef]
Ivushkin, K.; Bartholomeus, H.; Bregt, A.K.; Pulatov, A.; Franceschini, M.H.D.; Kramer, H.; van Loo, E.N.; Jaramillo Roman, V.; Finkers, R. UAV based soil salinity assessment of cropland. Geoderma 2019, 338, 502–512. [Google Scholar] [CrossRef]
Cheng, Q.; Xie, R.; Wu, J.; Ye, F. Deep Learning-Based Spatiotemporal Fusion Architecture of Landsat 8 and Sentinel-2 Data for 10 m Series Imagery. Remote Sens. 2024, 16, 1033. [Google Scholar] [CrossRef]
Tang, X.; Bratley, K.H.; Cho, K.; Bullock, E.L.; Olofsson, P.; Woodcock, C.E. Near real-time monitoring of tropical forest disturbance by fusion of Landsat, Sentinel-2, and Sentinel-1 data. Remote Sens. Environ. 2023, 294, 113626. [Google Scholar] [CrossRef]
Cao, X.; Chen, W.; Ge, X.; Chen, X.; Wang, J.; Ding, J. Multidimensional soil salinity data mining and evaluation from different satellites. Sci. Total Environ. 2022, 846, 157416. [Google Scholar] [CrossRef] [PubMed]
Omuto, C.T.; Kome, G.K.; Ramakhanna, S.J.; Muzira, N.M.; Ruley, J.A.; Jayeoba, O.J.; Raharimanana, V.; Owusu Ansah, A.; Khamis, N.A.; Mathafeng, K.K.; et al. Trend of soil salinization in Africa and implications for agro-chemical use in semi-arid croplands. Sci. Total Environ. 2024, 951, 175503. [Google Scholar] [CrossRef] [PubMed]
Drakonakis, G.I.; Tsagkatakis, G.; Fotiadou, K.; Tsakalides, P. OmbriaNet—Supervised Flood Mapping via Convolutional Neural Networks Using Multitemporal Sentinel-1 and Sentinel-2 Data Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2341–2356. [Google Scholar] [CrossRef]
Hafner, S.; Nascetti, A.; Azizpour, H.; Ban, Y. Sentinel-1 and Sentinel-2 Data Fusion for Urban Change Detection Using a Dual Stream U-Net. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Du, R.; Xiang, Y.; Chen, J.; Lu, X.; Wu, Y.; He, Y.; Xiang, R.; Zhang, Z.; Chen, Y. Potential of solar-induced chlorophyll fluorescence (SIF) to access long-term dynamics of soil salinity using OCO-2 satellite data and machine learning method. Geoderma 2024, 444, 116855. [Google Scholar] [CrossRef]
Suh, J.W.; Zhu, Z.; Zhao, Y. Monitoring construction changes using dense satellite time series and deep learning. Remote Sens. Environ. 2024, 309, 114207. [Google Scholar] [CrossRef]

Figure 1. Map of the study area.

Figure 2. Soil Sample Preparation Process.

Figure 3. (a) Spectral profile of the study region of Landsat8 in GEE; (b) spectral profile of the study region of Sentinel-2 in GEE.

Figure 4. The fundamental architecture of KAN.

Figure 5. Main flow chart.

Figure 6. Correlation analysis of characteristic variables.

Figure 7. Contribution of variable characteristics to different machine learning methods. (a) 1D-CNN, (b) KAN, (c) MLP, (d) 1D-ResNet, (e) ET, (f) RF, (g) LightBoost, (h) XGBoost.

Figure 8. Log-loss visualization images between different models. (a) 1D-ResNet, (b) KAN, (c) MLP, (d) ET, (e) RF.

Figure 9. Study area salinization mapping results. In the diagram, color block 1 indicates the SSC range of (0–1.03 g · kg⁻¹), color block 2 corresponds to the SSC range of (1.03 g · kg⁻¹–1.17 g · kg⁻¹), color block 3 represents the SSC range of (1.72 g · kg⁻¹–2.11 g · kg⁻¹), and color block 4 signifies the SSC range of (2.11 g · kg⁻¹–10.82 g · kg⁻¹).

Figure 10. Spline function characteristic output diagram for four methods of feature selection. (a) CARS, (b) UVE, (c) SPA, (d) RF_average.

Table 1. Criteria for classifying different soil salinity levels.

Type	Classification Standard
non-saline	SSC < 2 g · kg⁻¹
slightly saline	2 ≤ SSC < 4 g · kg⁻¹
moderately saline	4 ≤ SSC < 6 g · kg⁻¹
strongly saline	6 ≤ SSC < 10 g · kg⁻¹
extremely saline	SSC ≥ 10 g · kg⁻¹

Table 2. Environmental covariates selected for soil salinity monitoring.

Environmental Covariates	Explanation of Characteristic Variables	Formulas	References
S₁	Salinity index 1	$\frac{B l u e}{R e d}$	[16]
S₂	Salinity index 2	$\frac{(B l u e - R e d)}{(B l u e + R e d)}$	[16]
S₃	Salinity index 3	$\frac{(G r e e n \times R e d)}{B l u e}$	[16]
S₅	Salinity index 4	$\frac{(B l u e \times R e d)}{G r e e n}$	[16]
S₆	Salinity index 6	$\frac{(R e d \times N I R)}{G r e e n}$	[16]
SI	Salt index	$\sqrt{(B l u e \times R e d)}$	[16]
SI₁	Salt index 1	$\sqrt{(G r e e n \times R e d)}$	[16]
SI₂	Salt index 2	$\sqrt{({G r e e n}^{2} + {R e d}^{2} + {N I R}^{2})}$	[16]
SI₃	Salt index 3	$\sqrt{({R e d}^{2} + {G r e e n}^{2})}$	[16]
Int₁	Intensity index 1	$\frac{(G r e e n + R e d)}{2}$	[16]
Int₂	Intensity index 2	$\frac{(G r e e n + R e d + N I R)}{2}$	[16]
NDSI	Normalized Difference Salinity Index	$\frac{(R e d - N I R)}{(R e d + N I R)}$	[16]
NDVI	Normalized Difference Vegetation Index	$\frac{(N I R - R e d)}{(N I R + R e d)}$	[33]
ENDVI	Enhanced Normalized Difference Vegetation Index	$\sqrt{\frac{(N I R + G r e e n - 2 \times B l u e)}{(N I R + G r e e n + 2 \times B l u e)}}$	[34]
EVI	Enhanced Vegetation Index	$\sqrt{\frac{2.5 \times (N I R - R e d)}{(N I R + 6 \times R e d - 7.5 \times B l u e + 1)}}$	[33]
EEVI	Extended Enhanced Vegetation Index	$\sqrt{\frac{2.5 \times (N I R + S W I R 1 - R e d)}{(N I R S W I R 1 + 6 \times R e d - 7.5 \times B l u e + 1)}}$	[35]
CRSI	Canopy salt response index	$\sqrt{\frac{N I R \times R e d - G r e e n \times B l u e}{N I R \times R e d + G r e e n \times B l u e}}$	[34]
CORSI	Combined spectral response index	$\frac{(B l u e + G r e e n)}{(R e d + N I R)} \times N D V I$	[36]
GDVI	Green Difference Vegetation Index	$\frac{(N I R - G r e e n)}{(N I R + G r e e n)}$	[37]
SDI	Soil Difference Index	$\sqrt{{(N D V I - 1)}^{2} - {S I}^{2}}$	[38]
TB₁	Tasseled Cap Brightness		[39]
TB₂	Tasseled Cap Greenness		[39]
TB₃	Tasseled Cap Wetness		[39]

Table 3. Statistical table of soil salinity data for the study area.

Soil Salinity Content	Max	Min	Average	Standard Deviation	Standard Error	Skewness	Kurtosis	Coefficient of Variation
(Whole date = 1044)	10.82	0.24	1.93	1.67	1.07	1.76	3.71	0.87

Table 4. Soil salinity classification statistics.

Soil Salinity Content	Non-Saline	Slightly Saline	Moderately Saline	Strongly Saline	Extremely Saline
(Whole date = 1044)	65.7%	22.2%	8.9%	2.9%	0.3%

Table 6. Satellite monitoring of soil salinity modeling test results.

Feature Dataset	Models of Deep Learning								Models of Machine Learning
	1D-CNN		KAN		MLP		1D-ResNet		ET		RF		LightBoost		XGBoost
	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE
CARS	0.44	0.65	0.32	0.25	0.33	0.72	0.43	0.68	0.44	0.65	0.42	0.71	0.40	0.75	0.44	0.67
UVE	0.50	0.57	0.37	0.24	0.34	0.65	0.52	0.55	0.48	0.54	0.44	0.68	0.45	0.67	0.43	0.69
SPA	0.47	0.61	0.35	0.25	0.37	0.64	0.48	0.61	0.47	0.53	0.43	0.69	0.40	0.77	0.41	0.73
RF_average	0.52	0.56	0.49	0.23	0.38	0.62	0.54	0.53	0.50	0.56	0.45	0.59	0.48	0.58	0.46	0.59

Table 7. Satellite monitoring of soil salinization modeling test results.

Feature Dataset	Model	Accuracy	Precision	Recall	F1-Score
CARS	1D-ResNet	0.68	0.64	0.68	0.65
	KAN	0.74	0.73	0.74	0.71
	MLP	0.71	0.66	0.70	0.67
	ET	0.69	0.61	0.69	0.65
	RF	0.70	0.60	0.70	0.65
UVE	1D-ResNet	0.70	0.66	0.70	0.67
	KAN	0.74	0.67	0.72	0.70
	MLP	0.72	0.62	0.72	0.66
	ET	0.69	0.61	0.69	0.64
	RF	0.70	0.63	0.70	0.65
SPA	1D-ResNet	0.71	0.65	0.71	0.66
	KAN	0.75	0.70	0.74	0.73
	MLP	0.74	0.69	0.74	0.70
	ET	0.70	0.63	0.70	0.65
	RF	0.67	0.63	0.67	0.65
RF_average	1D-ResNet	0.66	0.64	0.66	0.65
	KAN	0.73	0.69	0.73	0.70
	MLP	0.71	0.60	0.71	0.64
	ET	0.70	0.65	0.70	0.67
	RF	0.70	0.64	0.70	0.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, M.; Fan, X.; Gao, P.; Guo, L.; Huang, X.; Gao, X.; Pang, J.; Tan, F. Monitoring Soil Salinity in Arid Areas of Northern Xinjiang Using Multi-Source Satellite Data: A Trusted Deep Learning Framework. Land 2025, 14, 110. https://doi.org/10.3390/land14010110

AMA Style

Zhang M, Fan X, Gao P, Guo L, Huang X, Gao X, Pang J, Tan F. Monitoring Soil Salinity in Arid Areas of Northern Xinjiang Using Multi-Source Satellite Data: A Trusted Deep Learning Framework. Land. 2025; 14(1):110. https://doi.org/10.3390/land14010110

Chicago/Turabian Style

Zhang, Mengli, Xianglong Fan, Pan Gao, Li Guo, Xuanrong Huang, Xiuwen Gao, Jinpeng Pang, and Fei Tan. 2025. "Monitoring Soil Salinity in Arid Areas of Northern Xinjiang Using Multi-Source Satellite Data: A Trusted Deep Learning Framework" Land 14, no. 1: 110. https://doi.org/10.3390/land14010110

APA Style

Zhang, M., Fan, X., Gao, P., Guo, L., Huang, X., Gao, X., Pang, J., & Tan, F. (2025). Monitoring Soil Salinity in Arid Areas of Northern Xinjiang Using Multi-Source Satellite Data: A Trusted Deep Learning Framework. Land, 14(1), 110. https://doi.org/10.3390/land14010110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Monitoring Soil Salinity in Arid Areas of Northern Xinjiang Using Multi-Source Satellite Data: A Trusted Deep Learning Framework

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Study Area

2.2. Collection and Analysis of Soil Samples

2.3. Acquisition and Processing of Satellite Images from Sentinel-2 and Landsat 8

2.4. Environment Variable Selection

2.5. Feature Selection Methods

2.6. Modeling

2.6.1. Machine Learning Models

2.6.2. Deep Learning Models

2.6.3. Interpretable Deep Learning Model-KAN

2.7. Data Processing Steps

2.8. Model Evaluation Method

2.9. Flow Chart

3. Results

3.1. Statistical Distribution of Soil Salinity in the Northern Territory

3.2. Correlations Between Soil Salinity and Spectral and Environmental Variables

3.3. Satellite Data to Monitor Soil Salinity Results

3.4. Results of Satellite Data Monitoring the Degree of Soil Salinization

3.5. Analysis and Mapping of Soil Salinization in Arid Farmlands of Northern Xinjiang

4. Discussion

4.1. Application of Sentinel-2 and Landsat8 Remote Sensing Data to Monitor Soil Salinization in the Farmland of Northern Xinjiang

4.2. The Influence of Characteristic Variable Selection on Monitoring Soil Salinization in the Farmland of Northern Xinjiang

4.3. The Great Potential Shown by Neural Network Techniques in Soil Salinity Monitoring in Cotton Growing Areas in the Northern Borde

4.4. Analysis and Solutions for Soil Salinity Prediction: Regression Classification Insights

4.5. Satellite Remote Sensing for Monitoring Soil Salinity Enhances Soil Salinization Understanding

4.6. Summary and Outlook for Multi-Source Satellite Remote Sensing

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI