1. Introduction
As the most important sugar crop, sugarcane is mainly grown in tropical and subtropical areas and provides approximately 80% of the world’s sugar [
1,
2]. The growth of sugarcane is closely related to fertilizer, water, and radiation intensity. Evaluating the growth situation of sugarcane in a timely manner, and adjusting the field management strategy accordingly, is of great significance to the yield and quality of sugarcane. In recent years, remote sensing with spectral images at different scales has been considered an effective high-throughput phenotyping solution for predicting the growth and yield of crops.
Large-scale spectral imagery can cover an area ranging from 25 to 3600 km
2 per image. Spatial resolutions generally range from more than 1 m to tens of meters, and some data can reach 0.3 m [
3,
4]. They are mainly obtained by satellites, such as Sentinel [
5], Gaofen (GF) [
6], Landsat [
7,
8], GeoEye [
9], and QuickBird [
10], which are normally provided by governments or commercial companies. Their main agricultural applications include land cover and land use investigation, vegetation classification, and crop yield forecasting. However, due to the limitations of low spatial resolution and fixed revisit cycles, it has formidable deficiencies for small-scale applications, which usually need more subtle and frequent data acquisition for crop growth monitoring [
11].
Middle-scale spectral imagery can provide data with submeter spatial resolutions ranging from less than 1 m to a few meters [
4,
12]. These kinds of images are mainly acquired by aviation aircraft platforms, integrated with multispectral or hyperspectral imaging sensors, at an altitude of several kilometers with fairly large coverage and high spatial resolutions [
13,
14]. However, these kinds of platforms are not popular due to their high costs.
Small-scale spectral imagery is usually acquired by unmanned aerial vehicles (UAVs) and can normally cover up to hundreds of hectares [
3,
4]. With the rapid development of UAVs, combined with the increasing availability and decreasing cost of spectral imaging sensors, opportunities to capture spectral images with high spatial and spectral resolutions have abounded. UAV-based remote sensing systems can easily reach spatial resolution of centimeter, which means that they are more sensitive to spatially heterogeneous information. Over the past 10 years, UAVs, especially drones, have been rapidly accepted and popularized to acquire reliable field crop information, weather permitting [
15]. They can provide subtle information about the crop canopies in every inch of a field, which is difficult to acquire via ground-based scouting by people, especially for tall plants. As such, these systems save labor and time [
16,
17].
Many previous studies have shown that crop yield [
18,
19], nitrogen (N) status [
20,
21,
22], protein content [
23,
24] and water stress [
25,
26] can be predicted by drone-based multispectral and RGB imagery. When establishing models for different crops, various spectral features including spectral reflectance, existed vegetation indices (VIs), and newly proposed VIs can be used as input variables. Taking N prediction as an example, Peng et al. used the ratio vegetation index (RVI), the normalized difference red-edge index (NDRE) and the terrestrial chlorophyll index (TCI), to predict potato N status [
27]. Zhang et al. used RVI and the normalized difference vegetation index (NDVI) to predict rice N status [
28]. Osco et al. used NDVI, NDRE, the green normalized difference vegetation (GNDVI), and the soil-adjusted vegetation index (SAVI) to predict maize leaf nitrogen concentration (LNC) [
29]. For water status prediction, SAVI [
25], the normalized green-red difference index (NGRDI) [
26], NDVI [
30], NDRE [
31] and so on, were reported in different studies for different crops. One of the main reasons why different spectral features are used in different crops is that the physiological characteristics and canopy distribution characteristics of different crops are different. Sugarcane is a tall and dense sugar crop. Unlike other crops, its stalk is the main raw material for sugar production, and it is an important organ for accumulating nutrients. Sugarcane has a long growing season, blooms late, and most of the time its canopy contains only leaves. Therefore, it is of practical significance to find suitable spectral features and establish corresponding growth prediction models for sugarcane.
Preliminary studies of remote sensing for sugarcane have also been conducted in recent years. Sugarcane planting areas classification [
32], and large-scale yield prediction, [
33,
34] were reported based on satellite images. Predictions of sugarcane canopy nitrogen concentration (CNC) or LNC based on hyperspectral data [
35] or hyperspectral imagery [
36] were also reported. However, studies on CNC prediction and irrigation level classification based on high resolution multispectral imagery were seldomly reported.
In terms of modeling algorithms, both traditional machine learning algorithms and newly developed deep learning algorithms were used. Each has its own advantages and disadvantages. Deep learning algorithms have better performance in the case of sufficient samples. Ma et al. developed a county-level corn yield prediction model based on the Bayesian Neural Network (BNN) using multiple publicly available data sources over 20 years, including satellite images, climate observations, soil property maps and historical yield records [
37]. Khaki et al. proposed a convolutional neural network model called YieldNet to predict corn and soybean yield based on MODIS products [
38]. Yang et al. tried to use one-year hyperspectral imagery to train a CNN classification model to estimate corn grain yield [
39]. Prodhan et al. monitored drought over South Asia using a deep learning approach with 16 years of remote sensing data [
40]. It can be seen that a large volume of image data, as well as ground truth data in years, were commonly needed to provide a sufficient dataset to train a deep learning network. To collect this large number of data samples is very challenging. Therefore, the dataset of deep learning is difficult to produce in some circumstances. By contrast, the traditional machine learning methods which are generally based on statistics are suitable for most of the modeling problems when relatively small number of samples are available [
41,
42]. Partial least squares (PLS), extreme learning machines (ELMs), backpropagation neural networks (BPNNs), support vector machine (SVM), and others, have been widely used in crop nutrient predictions. For example, Li et al. [
43] used PLS to establish 12 models of fruits and seeds for rapid analysis and quality assessment. Kira et al. established a model for estimating the chlorophyll and carotenoid contents of three tree varieties based on BPNN [
44]. Chen et al. constructed a BPNN model to invert rice pigment content with several spectral parameters as input [
45]. Pal et al. used an ELM algorithm to classify land covers with multispectral and hyperspectral data [
46]; it achieved a better classification accuracy than models established with BPNN and support vector machine (SVM), with far less computational complexity. Different machine learning methods can suit for different cases depending on variable quantity, sample quantity, and the potential relationship between inputs and output.
In this study, in order to monitor the growth status of sugarcane canopies by a high throughput method, high-resolution multispectral images of an experimental sugarcane field were obtained by a low-altitude UAV. The objectives of this study were (1) to determine the sensitive spectral features for the predictions of the CNC and irrigation levels; (2) to establish the prediction models of the CNC based on different machine learning algorithms such as PLS, BPNN, and ELM; (3) to establish classification methods of irrigation levels based on SVM and BPNN.
2. Materials and Methods
2.1. Study Area
The sugarcane experimental field was in Nanning, Guangxi Autonomous Region, China (latitude 22.84° N, longitude 108.33° E), as shown in
Figure 1. From the captured multispectral image (displayed in RGB) in the right of
Figure 1, it can be seen that the experimental field had 12 plots with concrete partitions. Three irrigation treatments and five fertilization treatments were applied in the field. Urea, calcium magnesium phosphate, and potassium chloride were chosen as N, phosphorus (P), and potassium (K) fertilizers, respectively. Eight plots with different irrigations and fertilizers and four blank plots without fertilizer and irrigation (denoted by BL) were set in the field. Concrete partitions at a depth of 1.2 m were built between each plot to prevent water and fertilizer infiltration. The planted seedlings were limited to 975,000 plants per hectare. The two irrigation treatments included 180 m
3/ha (denoted by W0.6) and 300 m
3/ha (denoted by W1.0), while the four fertilizer treatments included F1.0 (250 kg/ha of N, 150 kg/ha of P
2O
5, 200 kg/ha of K
2O), F0.9 (90% of the amount of F1.0), F1.1 (110% of F1.0) and F1.2 (120% of F1.0). Water and fertilizer were applied via drip irrigation pipes. Micronutrient fertilizers were equally applied to all the plots except the blank plots. The eight plots had the same size of 20 m × 6 m, with their different treatments denoted by W0.6F0.9, W0.6F1.0, W0.6F1.1, W0.6F1.2, W1.0F0.9, W1.0F1.0, W1.0F1.1 and W1.0F1.2.
The seed canes were planted on 24 March 2018. The seedling fertilizers, which accounted for 30% of the total fertilizer application, were applied on 11 May (28 days after planting). The tillering fertilizers, which accounted for 70% of the total fertilizer application, were applied on 29 June (67 days after planting). The irrigation schedule is listed in
Table 1.
Rainfall was another way that water entered the open field. The rainfall in this field was 509.8 mm from the day of planting (24 March) to the day of image acquisition (11 July), and the monthly average rainfall was 127 mm. The meteorological conditions of the experiment field, including precipitation (without the irrigation), temperature and mean relative humidity, are shown in
Figure 2. It can be seen that there was almost no rainfall for 15 days before the day of image acquisition. As such, the last event of water input in a large amount was the controlled irrigation on 4 July, which was a week before canopy image acquisition. This means that the rainfall had a very limited influence on the remote evaluation of water stress conditions under specific irrigation amounts.
2.2. Data Collection
The multispectral images were captured at noon on 11 July 2018 (109 days after planting), in the elongating stage. The weather was sunny, cloudless, and windless. The image acquisition system was mainly composed of a drone modeled Phantom 4 Pro (DJI, Shenzhen, China) and a multispectral image sensor RedEdge-MX (MicaSense, Seattle, WA, USA), as shown in
Figure 3a,b, respectively. RedEdge-MX image sensor has five spectral bands at 475 nm (blue, B), 560 nm (green, G), 668 nm (red, R), 717 nm (red edge, RE), and 840 nm (near infrared, NIR), and is equipped with a light intensity sensor and a reflectance correction panel (Group VIII, USA,
Figure 3c) for radiation correction. The optical intensity sensor can correct the influence caused by changes in sunlight on the spectral images during a flight, and the fixed reflectance correction panel can be used for reflectance transformation. The drone flew at an altitude of 40 m, with 85% forward overlap and 85% side overlap. The time interval of image acquisition was 2 s, and the ground sample distance (GSD) was 2.667 cm. Four calibration tarps with reflectivity of 5%, 20%, 40% and 60%, respectively, were also placed at the open space next to the field before image acquisition, as shown in
Figure 3d. Two hundred and sixty multispectral images were finally collected.
2.3. Ground Sampling and CNC Determination
Each plot was divided into three sampling areas. Each sampling area was divided into nine grids, and one plant was randomly selected to collect the first fully unfolded leaf for each grid. Nine leaves were collected to form a leaf sample for each sampling area. A total of 36 samples were finally collected, and these were immediately brought back to the laboratory for N determination. All the samples were oven-dried at 105 °C for 30 min and afterward at 75 °C for about 24 h until at a constant weight. The dried leaves were ground and weighed to 0.3 g, and the Kjeldahl method [
47] was used to determine the total nitrogen (
TN, %) content. The
TN of those first-leaf samples, which were then considered as the CNC (%), could be calculated by Equation (1).
where % represents the unit of
TN and CNC;
V1 is the consumption volume of the acid standard solution, mL;
V0 is the titration blank volume, mL; C is the concentration of the acid standard solution, mol/L; 0.014 is the 1 mol standard titration solution equivalent to the weight of N, g;
m is the weight of the sample, g.
2.4. Multispectral Image Preprocessing
Pix4DMapper software (Pix4D, Prilly, Switzerland) was used to generate the mosaic image from the 260 original multispectral images, as shown in
Figure 4. The mosaic image was then imported and processed in ENVI software (L3Harris Technologies, Melbourne, FL, USA). Two preprocessing steps were conducted in ENVI, including radiation correction and geometric correction.
Radiation calibration was implemented using the radiometric correction module in ENVI. The “empirical line” method was selected, since four calibration tarps with known reflectivity were captured in the image. An empirical line was fitted by comparing the DN values and the reflectivity of the tarps. Subsequently, all the DN values in the mosaic image were able to be converted into reflectivity.
Geometric correction was conducted to eliminate the distortion. Four ground control points were selected at four corners of the field, as marked in the false-color image in
Figure 4. The “image to map” function was selected to implement geometric correction with the coordinate information of the ground control points. “Nearest neighbor”, which avoids introducing new pixel values, was used to resample the image to the same coordinate system (UTM projection, WGS-84 datum) as that of the ground control points.
In order to extract the region of interest (ROI) out of the background, a classification method, decision tree (DT), was used to extract the sugarcane canopy from the soil, weeds, shadow, concrete, and other interfering background features.
Figure 5 shows the NDVI image of the extracted canopy, and the white dots in the figure represented 36 sampling areas. To enhance sample quantity, each area was further divided into nine grids, which were approximately 1.5 m × 2.0 m in size. The average value of each grid was calculated as the spectral sample. Therefore, a total of 324 spectral samples were extracted.
2.5. Feature Extraction and Data Analysis Methods
2.5.1. Extraction of VIs
VIs have been widely used to qualitatively and quantitatively evaluate vegetation cover varieties and crop vigor. NDVI is the most commonly used VI, and it is also one of the important parameters closely related to crop chlorophyll and N concentration. Besides NDVI, nine other commonly used VIs (as shown in
Table 2) were also selected to compare their effects on predicting the CNC. The optimal VI or a combination of VIs was used to build the prediction models of the CNC and the irrigation levels.
2.5.2. Grey Relational Analysis
Grey relational analysis (GRA), also called grey incidence analysis (GIA), is an important part of grey system theory, which was developed by Julong Deng [
57]. At its core, it works to determine the primary and secondary relationships between various factors by calculating grey relational degree (
GRD). The higher the
GRD value of any two factors, the more consistent the change between those two factors. Therefore, it can be used to select the factor with the greatest influence [
58]. Let the reference sequence be
and the comparison sequence be
. The
GRD value between
X0 and
Xi is calculated by Equations (2) and (3).
where
is the identification coefficient, and its value range is 0–1, taken here as 0.5.
The GRA was conducted between all the spectral features and the CNC, which were all normalized. The GRD, which was higher than 0.8, reflected that the VI had a very strong influence on the CNC.
2.5.3. Correlation Analysis
Correlation coefficient (
R) [
59] can reflect the degree of the linear correlation between two datasets. It can be calculated by Equation (4).
where
n is sample size,
and
are the individual sample points indexed with
i;
and
are the means of
and
for
n samples.
The higher the absolute value of R, the higher the linear correlation between the two factors. It is generally considered that 0.7 ≤ |R| < 1 indicates a very high correlation, when 0.4 ≤ |R| < 0.7, it indicates a significant correlation, and when |R| < 0.4, it indicates a low correlation. Correlation analysis can be applied for multiple purposes in modeling, including: (1) to analyze the correlations between input variables and predictors to determine sensitive variables; (2) to analyze the correlations between multiple variables, during which only the variables with significant correlations should be utilized in order to simplify the complexity of the model; (3) to analyze the correlation between the predicted values of a model and the measured values, and to evaluate the effect of the model.
In this study, the correlations between the spectral features were analyzed to pick proper variables with less redundant information for CNC modeling and irrigation level classification.
2.6. Modeling Algorithms
At present, there are many machine learning algorithms. Based on previous researches, four algorithms were selected after comprehensive consideration, as shown in
Table 3.
PLS, BPNN and ELM were selected for CNC modeling, and the simple validation method, hold-out [
60], was selected for model validation. All 324 samples were divided into calibration set and validation set according to the ratio of 7:3.
SVM and BPNN were selected for irrigation level classification. Three-fold cross validation [
61] was used to produce more validation samples, and, therefore, to generate a comprehensive confusion matrix of the classification results.
The PLS algorithm builds a model by minimizing the sum of the squares of the errors. It combines the advantages of multiple linear regression, canonical correlation analysis and principal component analysis. BPNN has the characteristics of self-learning and self-adaptation, showing a strong ability to fit nonlinear functions; it also has a strong anti-interference ability and may be suitable for complex field environments. The ELM algorithm allows for the random generation of the weights and thresholds between the input layer and hidden layers; users only need to denote the number of hidden layer neurons in the whole training process. Compared with the traditional classification algorithms, ELM has a fast-learning speed and strong generalization capability. These three algorithms have different characteristics and might achieve better prediction results under different conditions or scenarios, so all three algorithms were adopted and compared for the CNC prediction in this study. The number of principal components of the PLS was 6. The training epoch, the learning rate, and the number of hidden layers of the BPNN model were 1000, 0.05, and 22, respectively. The transfer function and the number of hidden layers of the ELM model were sigmoidal function and 50, respectively.
SVM is a classic machine learning method for classification. It maps data from a low-dimensional space to a high-dimensional space through a kernel function and separates the classes with a decision surface that maximizes the margin between the classes. Thus, SVM was selected for the irrigation levels classification in this study. Due to its strong ability to perform nonlinear mapping, BPNN is suitable for not only solving fitting problems, but also classification problems, so BPNN was also selected here for comparison with SVM in the classification of irrigation levels. The penalty factor and the kernel function of the SVM model were 10 and 0.167, respectively. The training epoch, the learning rate, and the number of hidden layers of the BPNN model were 1000, 0.1, and 10, respectively.
2.7. Accuracy Assessment Metrics
R and root mean square error (
RMSE) were used to evaluate the accuracies of the CNC prediction models.
R was introduced in
Section 2.5, and here the correlation between the predicted values and the actual values were calculated to evaluate the accuracies of the prediction models.
RMSE, which was calculated by Equation (5), can directly reflect the errors of the prediction models.
where,
and
represent the estimated value and actual value for sample
i, respectively.
The confusion matrix is also known as the probability matrix or error matrix [
62]. It is a specific matrix for visualizing algorithm performance, and is often used to evaluate the classification results. The rows in the matrix represent the actual irrigation levels and the columns represent the predicted irrigation levels. The confusion matrix is named because it can easily indicate whether multiple classes are confused (that is, one class is expected to be another class). Common indicators including producer’s accuracy (
PA), user’s accuracy (
UA), and overall accuracy (
OA), can be calculated in the confusion matrix.
PA refers to the ratio of the correctly classified sample numbers in a class to the actual total numbers of that class, also called true positive rate (
TPR).
UA refers to the ratio of the correctly classified sample numbers in a class to the classified total numbers of that class, also called positive predictive value (
PPV).
OA refers to the ratio of all correctly classified sample numbers to all sample numbers of all the classes. The calculation formulas of the two indicators are shown in Equations (6)–(8).
TP, FP, TN and FN represent the numbers of true positive, false positive, true negative, and false negative samples in the classify result, respectively.
4. Discussion
The results in
Table 6,
Table 7 and
Table 8 indicated that the PLS models had the best performance for sugarcane CNC prediction based on different input combinations, which is consistent with research into citrus CNC prediction by Liu et al. [
63], grapevine LNC prediction by Moghimi et al. [
64], etc. The sugarcane CNC prediction model showed a highest accuracy of
R = 0.79 and
RMSE = 0.11, which was also close to the previous studies by Liu et al. (
R = 0.65,
RMSE = 0.13) [
63] for the citrus CNC prediction and Moghimi et al. (
R = 0.74,
RMSE = 0.23) [
64] for grapevine LNC prediction.
Compared to the five-band reflectance models in
Table 6, VI-based models in
Table 7 had obvious lower accuracy, indicating that taking VIs alone as the input would decrease the prediction accuracy comparing to taking the entire five-band reflectance as the inputs. It reflected that the VIs did not exert their characteristics, which should enhance the spectral features and reduce environmental interference [
65]. The main reason was that VIs do have advantages when it comes to reducing the influence caused by uneven light (illumination) and different backgrounds. However, in this study, only one image of a small field was acquired in a very short time, meaning that the light difference and background difference was not significant. This made the contribution of VIs less than that of the whole spectral reflectance [
66,
67].
Regardless, VIs could still help to improve the modeling accuracy, and this was proved in the results listed in
Table 8. Among different combinations of input spectral features, the five-band reflectance combined with the three VIs (SRPI, NPCI, and NGBDI) had the highest accuracy in CNC prediction, with the
Rv of 0.79 and the
RMSEv of 0.11; this was 8.2% higher in
Rv, and 15.4% lower in
RMSEv, than the five-band prediction model (
Rv = 0.73,
RMSEv = 0.13).
Moreover, all three of those VIs were calculated from the visible bands (SRPI and NPCI are both calculated from the green and red bands, while NGBDI is calculated from the blue and green bands), indicating that the visible bands contained more sensitive information for sugarcane CNC prediction. Ranjan et al. [
68] explored the spectral characteristic of CNC in crops, with characteristic wavelengths mainly in the range of 430 nm, 460 nm, 640 nm, 910 nm, 1510 nm, 1940 nm, 2060 nm, 2180 nm, 2300 nm, and 2350 nm. In this study, the multispectral camera had a spectral range of about 450 nm–850 nm, which contained only the visible sensitive bands. This proves the rationality that the VI selected in this study is mainly concentrated in the visible range. As is generally known, different N inputs could lead to different leaf pigment concentrations, leaf internal structures, and canopy structures [
64,
69,
70]. The visible bands are closely related to leaf pigments and canopy structures and, as such, offer a great potential for N prediction.
Furthermore, this research also discovered that the irrigation levels could be effectively classified based on the reflectance at red and blue, combined with the SRPI, NPCI and, NGBDI, which were the spectral features in the visible bands. Indeed, the NIR bands are generally more sensitive to plant water content. However, the most sensitive bands which sit between 1480 and 1500 nm are out of the range of the multispectral camera used in this study [
69,
70]. Insufficient water input could obviously affect plant metabolism, which indirectly affects the leaf pigment concentrations. Therefore, the irrigation level recognition model can achieve better classification accuracy by only using the three visible bands.
This study has achieved good results for CNC prediction and irrigation level classification based on multispectral remote sensing. Although research on crop monitoring based on UAV multispectral imagery have been widely carried out for more 10 years, but it is rarely applied in wide field management. Several bottlenecks need to be addressed at present. Sozzi et al. [
4] compared the advantages and disadvantages of satellite-, plane-, and UAV-based multispectral imagery in variable rate N application in terms of cost, economic benefit, optical quality, and usage scenarios. It was pointed out that although satellite- and plane-based imagery have low optical quality and low resolution, they can provide applicable variable N rate suggestions, and bring economic benefits for large-scaled farms due to their relatively low cost. The UAV platform does have limits in acquisition cost and flight coverage at present. However, as the development of UAV technology and the increase requirement of UAVs, the cost can be significantly reduced and the battery performance can be enhanced in the future. With the emergence of automated UAV base stations and the reduction of image processing costs, the large-scale application of UAVs is just around the corner. By then, UAV remote sensing technology can be widely accepted in farm-scale crop monitoring with its flexible and autonomous acquisition style, high-quality image data, and low-cost validity.