Machine Larning
Machine Larning
com
ScienceDirect
Advances in Space Research xxx (xxxx) xxx
www.elsevier.com/locate/asr
Received 28 April 2021; received in revised form 4 August 2021; accepted 14 August 2021
Abstract
The present study provided the first-time comprehensive evaluation of 12 advanced statistical and machine learning (ML) algorithms
for the Soil Moisture (SM) estimation from dual polarimetric Sentinel-1 radar backscatter. The ML algorithms namely support vector
machine (SVM) with linear, polynomial, radial and sigmoid kernel, random forest (RF), multi-layer perceptron (MLP), radial basis func-
tion (RBF), Wang and Mendel’s (WM), subtractive clustering (SBC), adaptive neuro fuzzy inference system (ANFIS), hybrid fuzzy inter-
ference system (HyFIS), and dynamic evolving neural fuzzy inference system (DENFIS) were used. Extensive field samplings were
performed for collection of in-situ SM data and other parameters from the selected sites for seven different dates and at two different
locations (Varanasi and Guntur District, India), concurrent to Sentinel-1 overpasses. The backscattering coefficients were considered
as input variables and SM as output variable for the training, validation and testing of the ML algorithms. The site at Varanasi was
used for the training, validation and testing of the models. On the other hand, the Guntur site was used as an independent site for check-
ing the model performance, before finalizing the algorithms. The performances of different trained algorithms were evaluated in terms of
correlation coefficient (r), root mean square error (RMSE) (in m3/m3) and bias (in m3/m3). The study identified the RF, SBC and ANFIS
as the top three best performing models with comparable and promising SM estimation. In order to test the robustness of these best
models (RF, SBC and ANFIS), further performance analysis was performed to the independent datasets of the Varanasi and Guntur
test sites, which indicates that the performance of these three models were consistent and SBC can be recommended as the best among
all for SM estimation.
Ó 2021 COSPAR. Published by Elsevier B.V. All rights reserved.
Keywords: Sentinel-1; Artificial Intelligence; Machine Learning Algorithms; Soil Moisture; Optimization
1. Introduction
https://doi.org/10.1016/j.asr.2021.08.022
0273-1177/Ó 2021 COSPAR. Published by Elsevier B.V. All rights reserved.
Please cite this article as: S. K. Chaudhary, P. K. Srivastava, D. K. Gupta et al., Machine learning algorithms for soil moisture estimation using
Sentinel-1: Model development and implementation, Advances in Space Research, https://doi.org/10.1016/j.asr.2021.08.022
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
cycle and land surface fluxes (Srivastava et al., 2019, 2014). and corn underneath SM, using dual polarimetric
Monitoring of SM at a regular time interval over cropped Sentinel-1A microwave satellite data by support vector
surfaces are important for the effective irrigation manage- machine (SVM), random forest (RF) and artificial neural
ment and development (Srivastava et al., 2013; Suman network (ANN) models. They showed that the perfor-
et al., 2020). Bazzi et al. (2019) also recognized the poten- mance of the ANN model is found lower in comparison
tial use of SM for detection of heavy rainfall in the south of to SVM and RF models. Gupta et al., (2017) compared
the France by correlating SM and rainfall data. Another the Back Propagation ANN, Radial Basis Function
study also investigate the relative SM indicators for recla- (RBF) neural network, Generalized Regression (GR) neu-
mation of wetlands sites in previously mined oil sands in ral network for the SM estimation using the bistatic scat-
Alberta, Canada (Zakharov et al., 2020). Radar can be terometer data. They suggested that the ML algorithms
used for SM estimation by using the backscatter measure- may provide some promising results. Chai et al., (2010)
ments at different polarization. From the research, it is evi- used the ANN for the retrieval of SM using microwave
dent that the microwave response at low-frequency (P to L- data with the spatial variability information of SM.
band) are sensitive towards the SM content over bare sur- Notarnicola et al., (2008) reported the comparison between
face as well as vegetated surfaces, because of its higher pen- neural networks and Bayesian algorithms for the retrieval
etration capability (Shi et al., 1997). However, very few of SM using scatterometer and radiometer data for a vari-
synthetic-aperture radar (SAR) systems onboard remote ety of agricultural field. They found that the neural net-
sensing satellites e.g., Japanese Earth Resources Satellite work approach is better than Bayesian, when algorithm
1 (JERS-1), Advanced Land Observation Satellite 1 is trained with more parameters. Liu et al. (2017) proposed
(ALOS-1) and ALOS-2 are operating at low frequency. a new SM retrieval approach based on ultra-wide echoes
The high-frequency C and X bands i.e., Sentinel-1, Radar and adaptive neuro fuzzy inference system (ANFIS)
Imaging Satellite 1 (RISAT-1), RADARSAT-1 & 2, algorithms.
COnstellation of small Satellites for the Mediterranean A recent study by Greifeneder et al., (2021) explores the
basin Observation- SkyMed (COSMO-SkyMed) and possibility of ML based approach and google earth engine
TerraSAR-X operated SAR systems also provide a signifi- for real time cloud based mapping SM at high spatial res-
cantly good results in the accurate retrieval of SM. These olution (50 m) by integrating data from the Landsat-8 opti-
high-frequency operated SAR systems were frequently used cal and thermal images, Sentinel-1 SAR images, and
in various studies for the mapping and retrieval of SM modelled data. Training and independent validation data-
(Kumar et al., 2019; Paloscia et al., 2013; Prasad et al., set were taken from International Soil Moisture Network.
2009) and crop monitoring (Kumar et al., 2018; Navarro Liu et al. (2021a) presented an approach to retrieve SM
et al., 2016). over farmland with the combination of SAR and optical
Various empirical, semi empirical and physical based data from Sentinel-1 and Sentinel-2, respectively. To estab-
models have been developed and successfully employed lish the relationship between the various features and SM,
to retrieve the SM over bare soil surfaces (Baghdadi two ML algorithms, viz. Support vector regression (SVR)
et al., 2017; Dave et al., 2021). Some physical models and GR neural network models were used. They also used
namely physical optical (PO), geometric optics (GO), small a convolutional neural network regressor (CNNR) to
perturbation model (SPM), and advanced integrated equa- extract deep features from remote sensing data. They con-
tion model (AIEM) are useful for the retrieval of SM up to clude that the CNNR model with optimal feature combina-
a limited range of surface roughness & surface characteris- tion can promisingly increase the SM retrieval accuracy.
tic. Attema & Ulaby (1978) introduced a radiative transfer Liu et al. (2021b) used physical models to estimate SM in
theory based semi empirical model namely water cloud bare and vegetation covered soil surfaces using dual-
model (WCM) for the retrieval of crop covered SM and polarized Sentinel-1A backscattering coefficients (VV and
vegetation parameters and it is later extended by various VH). WCM model was used to remove the vegetation effect
authors (Ulaby et al., 1984). However, the semi empirical from the radar backscattering coefficients. Modified SM
and physical based modelling approaches is very complex monitoring index (MSMMI) and modified perpendicular
due to the less understanding about radiative transfer- drought index (MPDI) from optical source, i.e., Sentinel-
based microwave response for the bare and vegetated soil 2A data was also used to estimate SM. And then, they used
surfaces. These models also have larger number of param- some ML models to integrate the optical and SAR data
eters. Therefore, there is a need of parameter free and less ability for improved SM estimation. These models were
complex machine learning (ML) modelling approaches for the GR, SVR, RF regression, and deep neural network
the retrieval of SM from bare and vegetated soil surfaces. (DNN). They concluded that the integration improves
The ML techniques are very popular among the scientist the SM estimation accuracy.
community during past two decades, which may overcome Most of the researchers have used some common ML
the limitations of above discussed models in retrieval of algorithms, however, the performance of several other
SM using radar backscattering (Gupta et al., 2015, 2017; ML algorithms like Wang and Mendel’s (WM), Subtrac-
Kumar et al., 2019; Srivastava, 2017; Srivastava et al., tive clustering (SBC), Hybrid neural fuzzy inference system
2013). Kumar et al. (2019) retrieved the wheat, barley (HyFIS), Dynamic evolving neural fuzzy inference system
2
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
(DENFIS) were not yet tested for the retrieval of SM using latitude and 70° 10´ and 80° 55´ of the eastern longitude.
SAR data. Thus, a rigorous analysis of various ML algo- The Krishna River forms the north-eastern and eastern
rithms for SM retrieval is needed to understand their per- boundary of the district, separating it from Krishna Dis-
formance in different scenarios and to check the trict. It holds tropical climate conditions with average
robustness of each model for high-resolution SM estima- annual temperature 28.5 °C and annual rainfall is about
tion. In the purview of the abovementioned problems, the 905 mm. The dominant soil types are black cotton and
present study focused on (1) Evaluation of different statis- red loamy soils. There is also alluvial soil type along the
tical and ML algorithms for SM estimation using Sentinel- bank of Krishna River. Paddy, cotton, arhar, chillies
1A data (2) Rigorous optimization of the models with maize, sesame, black gram, red gram, tobacco, etc. are
respect to different model parameters (3) Validation of the main crop types have been practiced in this region.
the model at the different spatial and temporal scales for Canals and bore wells are the primary source of irrigation
large scale implementation. in this region. The location map of study sites is shown in
Fig. 1.
The first campaign site belongs to the area in and Fig. 2 depicts the technical flow of the present study that
around the Varanasi district of Uttar Pradesh, India. It is mainly includes generation of terrain backscattering coeffi-
situated in Northern India and considered as food bowl cient, preparation of training, validation and testing data-
of India. It is among the listed sites for Scatsat-1 SM cam- sets, comparisons of different ML models, SM retrieval,
paign of Space Application Centre (SAC), Indian Space etc. The very first step is the pre-processing of downloaded
Research Organization (ISRO) and equipped with both Sentinel-1 Ground Range Detected (GRD) data to gener-
in situ sensors and hydrometeorological station. It is also ate radiometric terrain corrected backscattering coefficients
one of the recommended sites for National Aeronautics using Sentinel Applications Platform (SNAP) software.
Space Administration (NASA)-ISRO Synthetic Aperture The experimental sample dataset for each site is created
Radar (NISAR) airborne campaign. NISAR is a joint by considering VV- and VH- backscatter as predictors
Earth observation mission of NASA and ISRO with the and corresponding in-situ SM as response variable. The
aim to observe and monitor the complex Earth phenomena experimental dataset collected in year 2018–19 for Varanasi
using sophisticated radar imaging techniques. It will be the site has been divided into training and validation datasets
first satellite mission that will observe the Earth surface at using random sampling without replacement. The datasets
two different radar frequencies (L-band and S-band), of Varanasi (for year 2019–20) and Guntur site are kept
simultaneously. The complimentary L and S- band obser- separately for testing the models. The ML models were
vations deliver a variety of applications such as monitoring trained to obtain the optimized parameters. The trained
of glaciers and ice sheets, Earthquake dynamics and vol- models with optimal parameters were then implemented
canic eruptions, croplands and forest, coastal processes, on the validation datasets. At the last, the better perform-
natural hazards, etc. NISAR airborne campaign was the ing models were used on the testing dataset of the Varanasi
pre-cursor to the Space-borne NISAR mission, which aims and Guntur sites and their performances were assessed in
to build capacity for data processing and applications for terms of Pearson’s correlation coefficient (r), root mean
the forthcoming NISAR mission. square error (RMSE), and Bias. Finally, SM maps were
A moist subtropical climate with immense variations generated using the optimal models.
between winter and summer temperatures exist in this
region. The alluvial and calcareous types of soil with vary- 3.1. Machine Learning algorithms used in this study
ing texture from sandy loam to sandy clay loam are found
in this region. The study area is a part of Gangetic plain The advanced ML models such as, SVM (Polynomial),
encompasses approximately between 25°10ʹ to 25°35ʹ of SVM (Radial) and SVM (Sigmoid), RF, multi-layer per-
latitude and 82°40ʹ to 83°11ʹ of longitude with approximate ceptron (MLP), RBF, WM, SBC, ANFIS, HyFIS, and
area 1535 sq. km. The average altitude of this area is DENFIS were applied to fit the model using training data-
approximately 80 m above mean sea level with slope vari- sets for the SM estimation at dual polarization (Table 1).
ation of 0–3%. The study area has average annual temper- These models were implemented through R packages.
ature 26.1 °C with average rainfall 998 mm. The area is The packages available in R environment such as
very fertile and wealthy that makes it a very important ‘‘e1071” (Meyer et al., 2020), ‘‘randomForest” (Liaw and
region for agriculture point of view. Wiener, 2002), and ‘‘RSNNS” (Bergmeir and Benı́tez,
The second study site belongs to the Guntur district in 2012) are used for SVM regression models, RF regression,
the Indian state of Andhra Pradesh. It occupies an area MLP and RBF respectively. The remaining algorithms (i.e.
of approximately 11,391 square kilometres and spread WM, SBC, ANFIS, DENFIS, and HyFIS) were modelled
approximately between 15° 18´ and 16° 50´ of the northern using ‘‘frbs” (Riza et al., 2015).
3
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
Fig. 1. Location map of campaign sites with overlayed in-situ measurement for model training, validation and testing on land use land cover map of 2015–
16 (Source: Bhuvhan WMS https://bhuvan-vec2.nrsc.gov.in/bhuvan/wms).
4. Results and discussions For first study region, i.e., Varanasi, Sentinel-1 GRD
data were downloaded for two different years 2018–2019
4.1. Evaluation of the satellite and ground datasets and 2019–2020 (two wheat life cycle normally considered
from October to May every year) from ESA Copernicus
Sentienl-1 comprises a constellation of two polar- Open Access Hub on dated 26/12/2018, 07/01/2019,
orbiting satellites (Sentinel-1A and 1B), revolving in near- 31/01/2019, 24/02/2019, 21/12/2019 and 11/01/2020. The
circular sun-synchronous orbit at 693 km altitude with field campaigns were carried out at the nearly similar time
98.18° of inclination and 98.6 min of orbital period. It car- of Sentinel-1 overpasses in the study area. The area of each
ries a C-band (5.405 GHz) SAR with selectable dual polar- plot was considered much greater than the area covered by
ization and 6 days (12 days each) repeat cycle. It is right the pixel of Sentinel-1 image with homogeneous field con-
looking radar with incidence angle ranges between 20° to ditions. Several measurements were made within each plot
46°. In the Interferometric Wide swath (IW) mode, it area and taken the average of all measurements. The Ste-
acquires single look imagery with a swath of 250 km at spa- ven’s HydraGo portable SM sensor was used for the in-
tial resolution 5 m 20 m. In this mode, the incidence situ measurement of volumetric SM. Steven’s HydraGo
angle ranges from 29.1° to 46.0°. In this study, GRD prod- portable SM sensor measured the point-based SM by non-
ucts of IW mode is used which is Level-1 product. The pro- destructive method. It is operated on dielectric impedance
duct is focused SAR data that has been detected, multi- measurement principle which helps to measure the soil
looked and projected to ground range using an Earth ellip- dielectric permittivity’s. The measured soil dielectric per-
soid model. It has spatial resolution of 20 22 m and mittivity is converted into the volumetric SM content by
available with square pixel spacing of 10 10 m. using complex computation with the help of
microprocessor.
4
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
For Guntur site, the in-situ measurement was carried ered by Maize, Green Gram, Jowar, Paddy, etc. The details
out during the period from 22nd Feb to 03rd Mar 2018. of sites and in-situ data characteristics are given in Table 2.
During this in-situ campaign, two overpasses of Sentinel- Total 150 sample datasets (backscattering coefficients at
1 on date 17th Feb 2018 and 01st March 2018 were used VV & VH polarization and in-situ measured SM) were col-
and the mean backscatter from both the imagery was con- lected at Varanasi site during the year 2018–2019 on four
sidered. In the campaign in-situ measurement of volumetric different dates (26/12/2018, 07/01/2019, 31/01/2019, and
SM was made at different locations. The plots for were cov- 24/02/2019). The minimum, maximum, mean and standard
5
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
Table 1
List of ML algorithms used in this study.
Methods References Package Name
SVM (Linear) Vapnik, (2000) e1071
SVM (Polynomial) (Meyer et al., 2020)
SVM (Radial)
SVM (Sigmoid)
Random Forest Breiman et al. (1984) randomForest
(Liaw and Wiener, 2002)
MLP Hastie et al., (2009) RSNNS
RBF Broomhead and Lowe (1988) (Bergmeir and Benı́tez, 2012)
WM Wang and Mendel (1992) frbs
SBC Chiu, (1996), Yager and Filev (1994) (Riza et al., 2015)
ANFIS Jang (1993)
HyFIS Kim and Kasabov (1999)
DENFIS Kasabov and Song (2002)
deviation (SD) values of in-situ SM (in m3 m 3 ) for whole mum, mean and SD values of backscattering coefficient
data samples are found as 0.035, 0.460, 0.246, 0.104, in dB for VH (–23.06, 13.12, 16.13, 1.83) and for VV
respectively. The values of backscattering coefficients at (-14.22, 7.34, 9.87, 1.26).
VV polarization ranges from 17.01 dB to 4.44 dB, with Fig. 3 shows the temporal variation of spatially average
a mean value of 10.23 dB and SD of 2.71 dB. However, measured SM, spatially average Sentinel-1 VV polarized
the values of backscattering coefficients at VH polarization backscattering coefficients and spatially average Sentinel-
varies from 24.43 dB to 12.84 dB with a mean value of 1 VH polarized backscattering coefficients for Varanasi site
17.14 dB and SD of 2.07 dB. The total datasets for year (2018–2019) at four different dates. The stages of wheat
2018–2019 were divided into two independent datasets crops during four dates of sampling at Varanasi site in
namely training datasets (70% of total datasets) and valida- the rabi season of 2018–19 are depicted in Fig. 4.
tion datasets (30% of total datasets) for the training and
validation of statistical and ML algorithms. 4.2. Optimization of model parameters
To test the performance of optimized algorithms, 63
(21st Dec 2019) and 67 (11th Jan 2020) in-situ SM mea- Prior to estimation of SM, the selected models were
surements with minimum, maximum, mean and SD values optimized using the training datasets. In the present study,
in m3 m 3 (0.120, 0.510, 0.284, 0.091) and (0.163, 0.500, trial and error approach is used for optimization of models,
0.366, 0.090) respectively, were considered for Varanasi which is the most commonly approach being used to get
site. For the prior date, the minimum, maximum, mean the optimized model’s parameters. For each method, the
and SD values of VH and VV backscattering coefficient parameters are tuned to get optimal parameters, which is
in dB are (-21.96, 11.70, 16.79, 2.30) and (-12.94, given in Table 3. The SVM models with linear, polynomial,
4.16, 8.06, 1.80) respectively. For later date their values radial basis and sigmoid kernel were optimized by tuning
are (-21.70, 13.74, 17.39, 1.68) and (-14.88, 6.14, the applicable parameters from the list of cost, epsilon.
10.56, 2.01), respectively. Moreover, 148 in-situ measure- degree, gamma, and coef0. The RF model tuned over
ments for Guntur site were taken into account for testing two input parameters namely mtry and mtree. The best
the optimized algorithms. The minimum, maximum, mean, result (r = 0.954, RMSE = 0.034 and bias = -0.001) were
SD values of SM (in m3 m 3 ) are 0.007, 0.589, 0.246, and obtained for the mtree value 200 and mtry value 2. For
0.151 respectively. For this dataset the minimum, maxi- MLP model, number of neurons in each hidden layer
Table 2
Study sites and in-situ data characteristics.
Geographic Location Varanasi, India Guntur, India
Satellite Sentienl-1 Sentienl-1
Date of Imagery 26th Dec 2018, 07th Jan 2019, 17th Feb 2018,
31st Jan 2019, 24th Feb 2019, 01st March 2018
21st Dec 2019 and 11th Jan 2020
Date of Sampling Coincidence with the Date of overpass 22nd Feb to
03rd Mar 2018
Total No. of in-situ datasets for Training and Validation 150 (all four dates) –
Total No. of in-situ datasets for Testing 63 (21st Dec 2019) 148
67 (11th Jan 2020)
Major Crop Wheat Mixed (Maize, Green Gram, Jowar, Paddy, etc.)
6
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
Fig. 3. Temporal variations of spatially averaged in-situ SM and corresponding spatially averaged Sentinel-1 backscattering coefficients (VV and VH) for
Varanasi site (2018–2019).
(ranges between 1 and 600), number of hidden layers ber of neurons in each hidden layer (15), maximum itera-
(ranges from 1 to 4) and maximum iterations (1000 and tion (1000) and number of hidden layers (1).
2000) were taken for the optimization process. Overall The WM model were tuned for number of labels, max-
good result was obtained with number of neurons in each imum iteration, and step size. The membership type
hidden layer as 600, maximum iterations as 1000 and num- (GAUSSIAN), t-norm type (HAMACHER), implication
ber of hidden layers as 4 with the values of performance function type (ZADEH), and defuzzification type
indices as r (0.773), RMSE (0.068) and bias (0.021). For (WAM) were kept default for the optimization of model.
RBF model, the best performance was found with the num- The optimization results reveal that the performance of
Fig. 4. Wheat crop at its different growth stages and three different locations.
7
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
Table 3
Optimised parameters during the training of different ML and statistical algorithms.
Methods Parameters
SVM (Linear) Kernel: linear, Cost: 1, Epsilon: 0.5
SVM (Polynomial) Kernel: polynomial, Degree: 3, Gamma: 0.5, cost: 1, Epsilon: 0.1, Coef0: 1
SVM (Radial) Kernel: radial, Gamma: 0.5, Cost: 3, Epsilon: 0.1
SVM (Sigmoid) Kernel: sigmoid, Gamma: 0.1, Cost: 1, Epsilon: 0.6
Random Forest mtry: 2, No. of tree: 200, Importance: TRUE
MLP No. of neurons per hidden layer: 400, No. of hidden layers: 4,
Max iteration: 2000, Initialization Function: Randomize_Weights,
Initialization Function Params: (-0.3, 0.3),
Learning Function: Std_Backpropagation, Learning Function Params: (0.2, 0),
update Function: Topological_Order, Update function Params: 0,
Hidden activation function: Act_Logistic, Shuffle patterns: TRUE,
output Act Func: Act_Logistic
RBF No. of neurons per hidden layer: 20, No. of hidden layers: 1, Max iteration: 1000,
Initialization Function: RBF_Weights,
Initialization Function Params: (0, 1, 0, 0.02, 0.04),
Learning Function: RadialBasisLearning,
learning Function Params.: (1e-05, 0, 1e-05,0.1, 0.8),
Update Function: Topological_Order, updateFuncParams: 0,
Shuffle Patterns: TRUE, linOut: TRUE
WM No. of labels: 10, Membership type: GAUSSIAN, Max iteration: 500, Step size: 0.01, t-norm type: HAMACHER, Implication
function type: ZADEH,
Defuzzification type: WAM
SBC Radius of neighbourhood (r.a): 0.5, Upper threshold: 0.5, Lower threshold: 0.15
ANFIS No. of labels: 10, Max iteration: 600, Step size: 0.001, Membership type: GAUSSIAN, t-norm type: MIN, Implication function type:
ZADEH, Defuzzification type: WAM
HyFIS No. of labels: 15, Max iteration: 1000, Step size: 0.01, t-norm type: MIN, Implication function type: ZADEH, Defuzzification type:
COG
DENFIS Distance threshold: 0.1, Max iteration: 3000, Step size: 0.01, d: 2
WM only depends on the number of labels. As the number were obtained for number of labels (15), maximum itera-
of labels increases, the performance of model increases. tion (1000) and step size (0.010). Several parameters (Dthr,
Increasing the number of labels corresponds to the increas- Maximum iteration, step size and d) were used for the opti-
ing influence of individual data points, which leads to the mization of DENFIS model. The best result was found at
overfitting during training of the model. The better opti- Dthr (0.1), maximum iteration (3000), step size (0.01) and
mized parameters were set to number of labels = 10, max- d (2) with r (0.802, 0.804), RMSE (0.064) and bias (0).
imum iteration = 500, and step size = 0.01 in the present The increase in maximum iteration value from 3000 to
study. The performance of SBC only depends on radius 4000 does not observed a significantly change in the perfor-
of neighbourhood and independent on upper and lower mance of model, hence an optimum value of 3000 was con-
threshold values. With decreasing radius of neighbour- sidered for maximum iteration. The performance of model
hood, the performance of model is increasing. The SBC was not convincing at smaller step size (0.001) and d value
model have also shown overall reasonable performance equal to 1.
for all values of neighbourhood radius. The best results
were observed with radius of neighbourhood 0.5 with a 4.3. Performance evaluation of different models
bit of overestimation results.
The ANFIS model were also tuned for number of labels, The estimated values of SM using training datasets were
maximum iteration, step size, and membership type. The t- compared with the in-situ measured SM (used in training
norm type (MIN), implication function type (ZADEH), datasets). The trained model was also verified using the val-
and defuzzification type (WAM) were kept default param- idation dataset to estimate the SM. The estimated SM val-
eter during optimisation. The better result was found for ues using validation datasets were compared with the in-
the number of level (10), maximum iteration (600) and step situ measured SM. The performance of models was evalu-
size (0.001) for membership type (GAUSSIAN), with the r, ated in terms of r, RMSE, and bias. Taylor diagram is used
RMSE and bias 0.749, 0.080 and 0.082, respectively. The for the graphical representation of the model’s perfor-
parameters (t-norm type (MIN), implication function type mance in terms of r, centred root-mean square difference
(ZADEH), and defuzzification type (COG) were taken (cRMSD) & SD during training and validation datasets,
fixed and the remaining parameters (number of labels, respectively. Taylor diagram helps to designate the most
maximum iteration and step size) were taken to be tuned accurate model out from several models. The r between
for the optimization of HyFIS model. The optimum results modelled and observed data corresponds to the azimuthal
8
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
position of the modelled data. SD corresponds to the radial the values of r, RMSE and bias were obtained as 0.805,
distance from the origin. The distance (in the same units as 0.063 and 0.000 for training and 0.630, 0.077 and 0.007
the SD) of modelled data from observed data represents for validation datasets, respectively. The values of r,
their cRMSD (Taylor, 2001). Table 3 depicted the opti- RMSE and bias computed using HyFIS model were
mum values of the model parameters for the estimation obtained as 0.844, 0.059 & 0.002 for training datasets
of SM using Sentinel-1A satellite data. The optimized and 0.614, 0.083 & 0.002 for validation datasets, respec-
model by training datasets was used for the estimation of tively. The performances inferred by Taylor diagram were
SM using validation datasets. The performance results also found significantly similar to the performances reck-
for all models are also summarized in Table 4. oned by the r, RMSE and Bias values (Fig. 5 (a-b)). It is
The performance of SVM model with linear, polyno- evident that SD values of observed SM are 0.107 and
mial, radial and sigmoid kernels were found almost similar. 0.098 for training and validation datasets respectively,
The performance of RF model was found better than SVM however its value for estimated SM by all models varies
model during training. Whereas, the performance of RF from 0.043 (ANFIS) to 0.106 (HyFIS) and 0.041 (ANFIS)
model was found approximately similar as SVM models to 0.093 (HyFIS). The value of cRMSD ranges between
for validation datasets. MLP model provided the values 0.033 (RF) to 0.080 (ANFIS) for training and between
of r, RMSE & bias 0.757, 0.071 & 0.004 for training 0.070 (MLP) to 0.083 (HyFIS) for validation datasets.
and 0.668, 0.072 & 0.008 for validation datasets, respec- The combination of three performance indices having
tively. For RBF, the values of r, RMSE and bias were fluctuations in order to analyze the performance of differ-
found as 0.772, 0.069, 0.002 during training and 0.65, ent models. It makes lot of complications to draw the con-
0.074, 0.009 for validation datasets, respectively. The per- clusive evidence for choosing the best performer. The
formance of RBF model was found a bit higher than the selection of best performer model was based on the values
MLP for training and about similar results for validation of all three performance indices during training and valida-
samples. tion. The performance of RF model was found better
The RF model shows best performance with value of, among all the algorithms based on the proper evaluation
RMSE and bias 0.955, 0.033, and 0.001, respectively of RMSE and bias values during the training datasets
for training samples. For validation samples, RF shows (Table 4). For validation datasets, the performance of
significantly good RMSE (0.075), and bias (-0.003) values. SBC model was found best among all models based on
However, the correlation between estimated and observed the combination of all the performance indices, followed
SM of validation datasets decreases significantly from by RF, ANFIS, WM, SVM (Sigmoid) and HyFIS models.
0.955 to 0.654. The values of r, RMSE and bias of WM
model were found as 0.846, 0.054, and 0.001 for training 4.4. Testing of algorithms using backscattering datasets for
samples, respectively, whereas for validation samples the different dates and sites
following values were found as 0.662, 0.074 and 0.008,
respectively. With SBC model, the values of r, RMSE For testing of the models, two Sentinel-1 imagery of Var-
and bias 0.790, 0.067, 0.012 were reported during training anasi regions for two different dates (21st December 2019
and 0.644, 0.075, 0.002 obtained for validation, respec- and 11th January 2020) and two Sentinel-1 imagery of Gun-
tively. The better values of performance indices were tur, Andhra Pradesh (17th Feb 2018 and 01st March 2018)
shown by ANFIS model as r (0.74), RMSE (0.083) & bias were used. Fig. 6(a-c) and 6(d-f) showed the scatter plot
(0.020) for training and r (0.689), RMSE (0.075) & bias between the observed and estimated SM values obtained
(0.005) for validation respectively. In case of DENFIS, by RF, SBC and ANFIS models for the date 21 December
Table 4
Comparative results for training and validation datasets of SM by different optimized algorithms.
Methods Training Validation
r RMSE (m3m 3) bias (m3m 3) r RMSE (m3m 3) bias (m3m 3)
SVM (Linear) 0.751 0.070 0.000 0.673 0.073 0.053
SVM (Polynomial) 0.774 0.068 0.006 0.676 0.074 0.043
SVM (Radial) 0.790 0.066 0.019 0.644 0.077 0.031
SVM (Sigmoid) 0.752 0.071 0.018 0.661 0.073 0.033
Random Forest 0.955 0.033 0.003 0.654 0.075 0.014
MLP 0.773 0.068 0.007 0.695 0.071 0.043
RBF 0.772 0.069 0.008 0.650 0.074 0.036
WM 0.846 0.057 0.003 0.662 0.074 0.033
SBC 0.790 0.067 0.048 0.644 0.075 0.009
ANFIS 0.740 0.083 0.083 0.689 0.075 0.020
HyFIS 0.844 0.059 0.007 0.614 0.083 0.009
DENFIS 0.805 0.063 0.001 0.630 0.077 0.027
9
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
Fig. 5. Evaluation of different models observed and predicted SM by Taylor diagram for (a) training and (b) validation datasets.
2019 and 11 January 2020 respectively. The performances of good agreement for both the Varanasi and Guntur sites dur-
these models during the testing dates are also depicted in ing testing, although some lower performance can be seen
Taylor diagram (Fig. 7 (a-b)) and the generated SM maps over Guntur site. The performances of these three models
over entire study area are presented in Fig. 8 (a-b). Fig. 9 are very close, however, SBC can be concluded as the best
(a), (b), and (c) showed the generated SM maps, the scatter among all on the basis of its computational efficiency,
plot between estimated and in-situ SM, and Taylor diagram implementation simplicity and consistency in the perfor-
respectively for the three models (RF, SBC and ANFIS) over mance. It can also be observed from the results that the
Guntur region. The values of performance indices are in SM values during the wet conditions were higher than the
Fig. 6. (a-f) Scatter plot between observed SM and estimated SM from three best performed models (RF, SBC and ANFIS).
10
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
Fig. 7. Taylor diagram to depict the testing results of three best performed models (a) for data collected on 21 December 2019 and (b) for data collected on
11 January 2020.
dry conditions. Those areas which are closer to river or lake well, some errors in the measurements can be attributed to
are having high SM values than dry areas. The values of SM the spatial mismatch between the point-based measurements
in the agricultural areas are in the good agreement with and Sentinel-1 footprint and time of satellite overpass time,
in situ measurements. Although the models are performing agricultural activities in the area etc.
Fig. 8. Soil moisture map generated through three different models (RF, SBC and ANFIS) using Sentinel-1 backscatter imagery (VV and VH) on two
different dates (a) 21 December 2019 and (b) 11 January 2020.
11
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
Fig. 9. Three different models (RF, SBC and ANFIS) testing over Guntur, Andhra Pradesh (a) Estimated Soil moisture map using Sentinel-1 backscatter
imagery (VV and VH) (b) Scatter plot between estimated SM and in-situ SM and (c) Taylor plot showing performance during testing.
5. Conclusion used for the testing using the independent testing datasets
over Varanasi and Guntur sites. The overall results of best
For the first time, a rigorous analysis of the twelve differ- performing models were also found consistent during the
ent statistical and ML algorithms (SVM (linear), SVM testing of the models. SBC was also found computationally
(polynomial), SVM (radial), SVM (sigmoid), RF, MLP, less expensive, simple with consistent performance. The out-
RBF, WM, SBC, ANFIS, HyFIS, and DENFIS) were come of this study can be useful for monitoring of SM in
carried out for the estimation of SM using Sentinel-1 VH Indian cropping conditions and could provide assistance in
and VV backscatter in the Indian cropping conditions. The agricultural services. In future course of action, these algo-
in-situ measurement of SM was carried out over wheat crop rithms should be calibrated and tested over different regions
fields at six different dates for two years with wide spatial and and crop types. In next work, the ensembles of different mod-
temporal coverage. The values of performance indices were els will be also attempted.
computed between model estimated and in situ measured
SM during training, validation and testing. The performance Declaration of Competing Interest
of the models was compared in terms of correlation coeffi-
cient, root mean square error and bias. Among these models, The authors declare that they have no known competing
the performance of the SBC was found better than the other financial interests or personal relationships that could have
models used in this study. Best performing models were then appeared to influence the work reported in this paper.
12
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
Acknowledgements crop cover types using C-band synthetic aperture radar data. Geocarto
Int. 34 (9), 1022–1041.
Kumar, P., Prasad, R., Gupta, D.K., Mishra, V.N., Vishwakarma, A.K.,
The authors would like to acknowledge Airborne L&S Yadav, V.P., Bala, R., Choudhary, A., Avtar, R., 2018. Estimation of
SAR RA, SAC, ISRO, Ahmedabad for the financial sup- winter wheat crop growth parameters using time series Sentinel-1A
port under NASA-ISRO Synthetic Aperture Radar SAR data. Geocarto Int. 33 (9), 942–956.
(NISAR) mission. The authors are also thankful to Coper- Liaw, A.A., Wiener, M., 2002. Classification and Regression by ran-
nicus Open Access Hub for providing Sentinel-1 data. domForest. R News 2, 18–22.
Liu, J., Xu, Y., Li, H., Guo, J., 2021a. Soil Moisture Retrieval in
Farmland Areas with Sentinel Multi-Source Data Based on Regression
References Convolutional Neural Networks. Sensors 21 (3), 877. https://doi.org/
10.3390/s21030877.
Attema, E.P.W., Ulaby, F.T., 1978. Vegetation modeled as a water cloud. Liu, X., Yu, X., Ren, J., Liang, J., 2019. Soil Moisture Retrieval Using
Radio Sci. 13 (2), 357–364. UWB Echoes via ANFIS and ANN, in: Liang Q., Mu J., Jia M., Wang
Baghdadi, N., Choker, M., Zribi, M., El Hajj, M., Paloscia, S., Verhoest, W., Feng X., Z.B. (Ed.), Communications, Signal Processing, and
N.E.C., Lievens, H., Baup, F., Mattia, F., 2017. New empirical model Systems. CSPS 2017. Lecture Notes in Electrical Engineering.
for radar scattering from bare soils. In: in: 2017 IEEE International Springer, Singapore, pp. 1261–1268.
Geoscience and Remote Sensing Symposium (IGARSS), pp. 4139– Liu, Ying, Qian, Jiaxin, Yue, Hui, 2021b. Combined Sentinel-1A with
4142. Sentinel-2A to Estimate Soil Moisture in Farmland. IEEE J. Sel. Top.
Bazzi, H., Baghdadi, N., El Hajj, M., Zribi, M., 2019. Potential of Appl. Earth Obs. Remote Sens. 14, 1292–1310.
Sentinel-1 Surface Soil Moisture Product for Detecting Heavy Rainfall Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., 2020.
in the South of France. Sensors 19 (4), 802. https://doi.org/10.3390/ e1071: Misc Functions of the Department of Statistics, Probability
s19040802. Theory Group (Formerly: E1071), TU Wien.
Bergmeir, C., Benı́tez, J.M., 2012. Neural Networks in R Using the Navarro, Ana, Rolim, João, Miguel, Irina, Catalão, João, Silva, Joel,
Stuttgart Neural Network Simulator: RSNNS. J. Stat. Softw. 46, 1–26 Painho, Marco, Vekerdy, Zoltán, 2016. Crop Monitoring Based on
https://doi.org/10.18637/jss.v046.i07. SPOT-5 Take-5 and Sentinel-1A Data for the Estimation of Crop
Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J., 1984. Classification Water Requirements. Remote Sens. 8 (6), 525. https://doi.org/10.3390/
and Regression Trees. Chapman and Hall/CRC, Boca Ratan, Florida, rs8060525.
USA. Notarnicola, Claudia, Angiulli, Mariella, Posa, Francesco, 2008. Soil
Broomhead, D.S., Lowe, D., 1988. Radial basis functions, multi-variable moisture retrieval from remotely sensed data: Neural network
functional interpolation and adaptive networks. Malvern, Worcs. : approach versus Bayesian method. IEEE Trans. Geosci. Remote Sens.
Royals Signals & Radar Establishment, Great Britain, UK. 46 (2), 547–557.
Chai, S.-S., Walker, J., Makarynskyy, O., Kuhn, M., Veenendaal, B., Paloscia, S., Pettinato, S., Santi, E., Notarnicola, C., Pasolli, L., Reppucci,
West, G., 2010. Use of soil moisture variability in artificial neural A., 2013. Soil moisture mapping using Sentinel-1 images: Algorithm
network retrieval of soil moisture. Remote Sens. 2 (1), 166–190. and preliminary validation. Remote Sens. Environ. 134, 234–248.
Chiu, S., 1996. Method and software for extracting fuzzy classification Petropoulos, George P., Ireland, Gareth, Srivastava, Prashant K., 2015.
rules by subtractive clustering. Proceedings of North American Fuzzy Evaluation of the Soil Moisture Operational Estimates From SMOS in
Information Processing., 461–465 Europe: Results Over Diverse Ecosystems. IEEE Sens. J. 15 (9), 5243–
Dave, R., Kumar, G., Kr. Pandey, D., Khan, A., Bhattacharya, B., 2021. 5251.
Evaluation of modified Dubois model for estimating surface soil Prasad, R., Kumar, R., Singh, D., 2009. Retrieve Soil Moisture and Crop
moisture using dual polarization RISAT-1 C-band SAR data. Geo- Variables From X-Band Scatterometer Observations. Prog. Electro-
carto Int. 36 (13), 1459–1469. magn. Res. B 12, 201–217.
Greifeneder, F., Notarnicola, C., Wagner, W., 2021. A machine learning- Riza, L.S., Bergmeir, C., Herrera, F., Benı́tez, J.M., 2015. frbs: Fuzzy
based approach for surface soil moisture estimations with google earth Rule-Based Systems for Classification and Regression in R. J. Stat.
engine. Remote Sens. 13 (11), 2099. https://doi.org/10.3390/ Softw. 65, 1–30.
rs13112099. Shi, J., Wang, J., Hsu, A.Y., O’Neill, P.E., Engman, E.T., 1997.
Gupta, D., Rajendra, P., Narayan, M., Ajeet, V., Kumar, S., 2015. Estimation of bare surface soil moisture and surface roughness
Support Vector Regression for Retrieval of Soil Moisture Using parameter using L-band SAR image data. IEEE Trans. Geosci.
Bistatic Scatterometer Data at X-Band. Int. J. Geol. Environ. Eng. 9, Remote Sens. 35, 1254–1266. https://doi.org/10.1109/36.628792.
1201–1204. Srivastava, Prashant K., 2017. Satellite Soil Moisture: Review of Theory
Gupta, D.K., Prasad, R., Kumar, P., Vishwakarma, A.K., 2017. Soil and Applications in Water Resources. Water Resour. Manag. 31 (10),
moisture retrieval using ground based bistatic scatterometer data at X- 3161–3176. https://doi.org/10.1007/s11269-017-1722-6.
band. Adv. Sp. Res. 59 (4), 996–1007. Srivastava, Prashant K., Han, Dawei, Ramirez, Miguel Rico, Islam,
Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Tanvir, 2013. Machine Learning Techniques for Downscaling SMOS
Learning - Data Mining. Inference, and Prediction, 2nd ed, Springer Satellite Soil Moisture Using MODIS Land Surface Temperature for
Series in Statistics. Springer, New York. https://doi.org/10.1007/ Hydrological Application. Water Resour. Manag. 27 (8), 3127–3144.
b94608. Srivastava, Prashant K., Han, Dawei, Rico-Ramirez, Miguel A., O’Neill,
Jang, J.-S. R., 1993. ANFIS: adaptive-network-based fuzzy inference Peggy, Islam, Tanvir, Gupta, Manika, 2014. Assessment of SMOS soil
system. IEEE Trans. Syst. Man. Cybern. 23 (3), 665–685. moisture retrieval parameters using tau–omega algorithms for soil
Kasabov, N.K., Song, Q., 2002. DENFIS: dynamic evolving neural-fuzzy moisture deficit estimation. J. Hydrol. 519, 574–587.
inference system and its application for time-series prediction. IEEE Srivastava, Prashant K., O’Neill, Peggy, Cosh, Michael, Kurum, Mehmet,
Trans. Fuzzy Syst. 10, 144–154. Lang, Roger, Joseph, Alicia, 2015. Evaluation of Dielectric Mixing
Kim, J., Kasabov, N., 1999. HyFIS: Adaptive neuro-fuzzy inference Models for Passive Microwave Soil Moisture Retrieval Using Data
systems and their application to nonlinear dynamical systems. Neural From ComRAD Ground-Based SMAP Simulator. IEEE J. Sel. Top.
Networks 12 (9), 1301–1319. Appl. Earth Obs. Remote Sens. 8 (9), 4345–4354.
Kumar, P., Prasad, R., Choudhary, A., Gupta, D.K., Mishra, V.N., Srivastava, Prashant K., Pandey, Prem C., Petropoulos, George P.,
Vishwakarma, A.K., Singh, A.K., Srivastava, P.K., 2019. Compre- Kourgialas, Nektarios N., Pandey, Varsha, Singh, Ujjwal, 2019. GIS
hensive evaluation of soil moisture retrieval models under different and Remote Sensing Aided Information for Soil Moisture Estimation:
13
S.K. Chaudhary et al. Advances in Space Research xxx (xxxx) xxx
A Comparative Study of Interpolation Techniques. Resources 8 (2), Vapnik, V.N., 2000. The Nature of Statistical Learning Theory, 2nd ed.
70. https://doi.org/10.3390/resources8020070. Springer, New York, USA.
Suman, Swati, Srivastava, Prashant K., Petropoulos, George P., Pandey, Wang, L.-X., Mendel, J.M., 1992. Fuzzy Basis Functions, Universal
Dharmendra K., O’Neill, Peggy E., 2020. Appraisal of SMAP Approximation, and Orthogonal Least-Squares Learning. IEEE
Operational Soil Moisture Product from a Global Perspective. Remote Trans. Neural Networks 3 (5), 807–814.
Sens. 12 (12), 1977. https://doi.org/10.3390/rs12121977. Yager, R.R., Filev, D.P., 1994. Generation of Fuzzy Rules by Mountain
Taylor, Karl E., 2001. Summarizing multiple aspects of model perfor- Clustering. J. Intell. Fuzzy Syst. 2, 209–219.
mance in a single diagram. J. Geophys. Res. Atmos. 106 (D7), 7183– Zakharov, Igor, Kapfer, Mark, Hornung, Jon, Kohlsmith, Sarah,
7192. Puestow, Thomas, Howell, Mark, Henschel, Michael D., 2020.
Ulaby, F.T., Allen, C.T., Eger, G., Kanemasu, E., 1984. Relating the Retrieval of Surface Soil Moisture from Sentinel-1 Time Series for
microwave backscattering coefficient to leaf area index. Remote Sens. Reclamation of Wetland Sites. IEEE J. Sel. Top. Appl. Earth Obs.
Environ. 14 (1-3), 113–133. Remote Sens. 13, 3569–3578.
14