Chen 2020
Chen 2020
A R T I C LE I N FO A B S T R A C T
Keywords: The occurrence of crop pests and diseases always affects the development of agriculture seriously, while pest
Occurrence of pests and diseases meteorology showed that climate is important in affecting the occurrence. Recently, recurrent neural network
Climate (RNN) has been broadly applied in various fields, which was designed for modeling sequential data and has been
Atmosphere circulation testified to be quite efficient in time series problem. This paper proposes to use bi-directional RNN with long
Pest counting
short-term memory (LSTM) units for predicting the occurrence of cotton pests and diseases with climate factors.
Bi-directional long-short term memory
First, the problem of occurrence prediction of pests and diseases is formulated as time series prediction. Then the
bi-directional LSTM network (Bi-LSTM) is adopted to solve the problem, which can capture long-term de-
pendencies on the past and future contexts of sequential data. Experimental results showed that Bi-LSTM shows
good performance on the occurrence prediction of pests and diseases in cotton fields, and yields an Area Under
the Curve (AUC) of 0.95. This work further verified that climate indeed have strong impact on the occurrence of
pests and diseases, and circulation parameters also have certain influence.
1. Introduction During the growth of cotton, there are many factors affecting the pro-
duction, where the most significant one is abnormal climate change.
With global warming, the occurrence frequency of regional crop Abnormal climate change can result in the continuous evolution of
pests and diseases has increased rapidly, causing great loss in agri- pests and further make them adaptive to the environment. In addition,
cultural production. The crop pest and disease is one of the major studies have shown that circulation parameters have a certain corre-
natural disaster in China, whose occurrence, development, and epi- lation with the occurrence of crop pests and diseases (Zhou and Gao,
demic are closely related to climate conditions, or occur with meteor- 2014). All of these factors seriously influence the yield and quality of
ological disasters. Investigating the relationship between pandemic crop production, and make it very difficult to control pests and diseases
diseases and climate is significant for establishing climate-pest fore- (Wu et al., 2009).
casting model and improving the long-term prediction of pests and Nowadays, the methods of controling pests and diseases in cotton
diseases. mainly include pesticide screening, ecological control, biological con-
Cotton is an important economic crop, which occupies a vital po- trol and artificial trapping (Luo et al., 2017), where pesticides are al-
sition in the national economy in China. However, cotton is always ways used and they are insecticidally effective and direct when used in
endangered by various pests and diseases during its growth. Perennial cotton fields. But most pesticides are highly toxic and often cause ser-
pests and diseases caused about 15–20% economic loss, even up to ious residual pollution. Subsequently, high efficiency, low degree and
50%, in recent years. Therefore, the control of pests and diseases is environment-friendly new types of pesticides were tried to be devel-
crucial to the growth of cotton, which can recover more than 900,000 oped for prevention and control on pests and diseases. With the rapid
tons of cotton annually by pests and diseases control (Cui et al., 2007). development of life sciences, biological control has become a popular
⁎
Corresponding authors.
E-mail addresses: [email protected] (P. Chen), [email protected] (C. Xie), [email protected] (B. Wang).
1
These authors contributed equally to this work.
https://doi.org/10.1016/j.compag.2020.105612
Received 12 December 2018; Received in revised form 15 May 2020; Accepted 28 June 2020
0168-1699/ © 2020 Elsevier B.V. All rights reserved.
P. Chen, et al. Computers and Electronics in Agriculture 176 (2020) 105612
direction more and more. Cotton bollworm is commonly known as 2. Material and method
cotton leaf hopper, which is a type of severe pests in cotton and okra.
Singh et al. (2018) evaluated 15 common housekeeping genes during 2.1. Dataset of pest images
different developmental stages of bollworm, and tried to feed/inject
sequence-specific double-stranded RNA (dsRNA). Targeting such es- The occurrence of crop pests and diseases is affected by a number of
sential genes can help in developing new generation insect resistant climate factors, such as temperature, humidity, rainfall, light and so on.
transgenic plants, targeted towards downregulation or knockdown of In addition, circulation parameters also affect the occurrence of pests in
essential genes for causing mortality. Moreover, many researches have specific ways. However, the interaction and the influence of various
been developed in releasing natural enemies of cotton fields, exploring factors on pests and diseases are very complex. The study of the impact
habits and resources related to habitat control to attract natural ene- of climate factors and circulation parameters on agriculture is of great
mies, which have played an important role in practice (Feng, 2008; Gao significant to the effective production of crops.
et al., 2016). For climate factors, the datasets of cotton pests and diseases from
Although the control of cotton pests and diseases has gotten good Crop Pest Decision Support System ( http://www.crida.in:8080/naip/
results, pests and diseases are often occurred in sudden and complex AccessData.jsp) were used, where 15,343 cotton documents recorded
ways. With computer development, machine learning-based methods weekly for 10 insect pests and diseases along with corresponding cli-
hold promising on the control and prevention of cotton pests and dis- mate factors across 6 important locations in India were investigated in
eases. Extensive studies have focused on the pest and disease prediction this work. Several climate factors are applied in the occurrence of pests,
of crops. Hang et al. proposed an convolutional neural network-based including Maximum Temperature (MaxT (°C)), Minimum Temperature
model to predict apple leaf diseases (Hang et al., 2019). Li et al. pre- (MinT (°C)), Relative Humidity in the morning (RH1 (%)), Relative
sented a deep learning-based pipeline for the visual localization and Humidity in the evening (RH2 (%)), Rainfall (RF (mm)), Wind Speed
counting of agricultural pests by self-learning of a saliency feature map (WS (kmph)), Sunshine Hour (SSH (hrs)) and Evaporation (EVP (mm)).
from convolutional neural network (Li et al., 2019). Xie et al. built an The historical records can be used to predict future occurrence of pests
model with multi-level learning features for automatic classification of and diseases. From the website, a total of 63 datasets with time-series
field crop insects (Xie et al., 2018). Ding et al. proposed an automatic records of cotton pests and diseases are obtained. Tables 1 and 2 pro-
detection pipeline on the basis of deep learning technique for identi- vide simple statistics on different types and areas of cotton pests and
fying and counting pests in images taken from field traps (Ding and diseases, respectively.
Taylor, 2016), which can real-time monitor and warn on the occurrence Moreover, 74 atmosphere circulation indexes from China
of pests in the field. Zhang et al. developed three models, multiplier Meteorological Administration National Climate Center (http://
feed-forward neutral networks (MLFN), general regression neutral cmdp.ncc–cma.net/cn/index.htm) were used for additional factors,
networks (GRNN) and support vector machine (SVM), to predict the which recorded the corresponding parameters of atmospheric circula-
occurrence area of dendrolimus superans, where SVM performs the best tion intensity in different regions, i.e., Indian Subtropical High Area
(Zhang et al., 2017). Index, Northern Hemisphere Subtropical High Intensity Index, Indian
As a special kind of RNN, LSTM neural networks is verified to be Subtropical High Intensity Index, etc. The website provided 74 circu-
efficient in modeling sequential data (Hochreiter and Schmidhuber, lation documents monthly from 1951 to nowadays. Table 3 lists the top
1997), which introduces gate mechanism into vanilla RNN to prevent 25 circulation indexes from random forests-based embedded method.
the vanished or exploding gradient problem. Moreover, Bi-LSTM neural
network (Bin et al., 2016), derived from LSTM network, has advantages 2.2. Data process and feature selection
in memorizing information for long periods in both directions, and
shows rapid improvement in comparison with LSTM for video de- Our research data mainly include two parts, climate-pest records
scription. Bi-LSTM has achieved good results in different fields. Jiang weekly and circulation parameter documents monthly. In order to be
et al. applied a character-level bi-directional LSTM network to represent unified in the time dimension and obtain enough data to train the
tokens and classify tags for each token, and the LSTM-based system network, circulation parameters monthly should be expanded into
achieved a micro-F1 score of 0.8986 in i2b2 strict evaluation (Jiang weekly statistics by interpolation technique. As a result, 8 climate
et al., 2017). Zhao et al. designed a deep neural network structure, parameters, 74 circulation parameters and corresponding cotton pest or
named Convolutional Bi-directional Long Short-Term Memory net- disease value records weekly were obtained.
works (CBLSTM), to address raw sensory data, which was used for
monitoring machine health, and experimental results have shown that
the model outperforms several state-of-the-art baseline methods (Zhao 2.3. Climate factor combination
et al., 2017). Xie et al. designed a deep neural network approach with
the state-of-the-art Bi-LSTM Network to extract e-cigarette safety in- According to plant disease and pest meteorology, there is a certain
formation in social media, which eventually achieved the best perfor- relationship between climate factors and the occurrence of pests and
mance compared with three baseline models, with a precision of diseases, which has been verified by many studies (Kelly et al., 2015;
94.10%, a recall of 91.80% and an F-measure of 92.94% (Xie et al., Prasetyo et al., 2017). In order to improve model performance on oc-
2018). currence prediction of cotton pests and diseases, some combinations of
This paper proposes a Bi-LSTM network-based method to predict the
Table 1
occurrence of cotton pests and diseases. An improved Bi-LSTM based
Cotton pest and disease datasets in different areas.
neutral network was properly designed with fully connected layers to
form a classification model, with the use of climate factors and some Location Number of samples
atmosphere circulation parameters. Results showed that our model
Akola 2028
outperformed other traditional prediction models for the occurrence Coimbatore 208
prediction of cotton diseases and insect pests. Lam 5265
Nagpur 3328
Pharbhani 2644
Sirsa 1870
Total 15343
2
P. Chen, et al. Computers and Electronics in Agriculture 176 (2020) 105612
Table 2 Table 4
The types of pest and disease datasets. The distribution of records at different levels of pests and diseases in cotton.
Type of pest and disease Number of sample Cotton pest No pest A little General Serious Total
(< 5%) (5–20%) (> 20%)
Bollworm 7183
Aphid 1032 Records 10974 1855 1209 1305 15343
Jassid 1974
Thrips 832
Whitefly 1508
circulation features and climate factors are fused to predict the occur-
Spodoptera 630
Mealybug/Miridbug 260
rence of cotton pests and diseases.
LeafBlight/LeafSpot 1924
2.5. The definition of occurrence level of pests and diseases
Total 15343
2.4. Circulation parameter selection 2.8. Architecture of pest/disease prediction with Bi-LSTM Network
Due to redundancy in features affecting model training, top circu- Recurrent Neural Networks (RNNs), and specifically a variant with
lation features were selected instead of all 74 original features. The Long Short-Term Memory (LSTM), are enjoying renewed interests and
redundant features not only can not contribute to the prediction of have been successfully applied in a wide range of machine learning
insect pests, but also may affect the training performance of our model. problems that involve sequential data (Karpathy et al., 2015). LSTM is a
All experiments in this work were implemented using Python, where recurrent neural network architecture (an artificial neural network)
the sklearn module gives us a large number of methods for feature se- published in 1997 by Hochreiter and Schmidhuber (Hochreiter and
lection. Here, random forests-based embedded method was adopted Schmidhuber, 1997), and has been refined and promoted by Alex
from the feature selection library for variable selection. This method Graves (Graves, 2013) recently. Like most RNNs, LSTM has a memory
mainly has two functions: investigating insights on the behavior of function that can be used to handle time series problems (Sutskever
variable importance index based on random forests and proposing a et al., 2014); Unlike traditional RNNs, LSTM is well-suited for long-term
ranking strategy of explanatory variables and a stepwise ascending dependency problems because it can solve the problems of gradient
variable introduction strategy (Genuer et al., 2010). Eventually 25 vanishing (Hochreiter and Schmidhuber, 1997; Bengio et al., 1994;
circulation features were obtained, as shown in Table 3. The top 25 Hochreiter, 1991) and gradient exploding (Mikolov, 2012; Pascanu
3
P. Chen, et al. Computers and Electronics in Agriculture 176 (2020) 105612
et al., 2013) caused by Back Propagation Through Time (BPTT). Bi- statistical data, which can ensure the comparability between different
LSTM neural network (Bin et al., 2016) is similar to LSTM network in types of pests and diseases. Before inputting data into the network, the
structure, and both of them are constructed with LSTM units (Schuster forecasting problem of time series for pests and diseases must be re-
and Paliwal, 1997). The special unit of this network is capable of framed as supervised learning problem. The ”Reshape” technique
learning long-term dependencies without keeping redundant informa- helped change data into format we needed. For RNN, the time series
tion. Bi-LSTM has an end-to-end working mode like neural network, records should be converted to 3D tenser (Nsamp , timesteps, Nfeat ). In this
which automatically processes input data and yields desired results work, Nsamp is the number of samples that is set as 15,343, timesteps as 4
(Miao et al., 2015). It does not require complex feature selection and and Nfeat as 38 including 37 features and one for pest/disease value. The
model testing compared with traditional machine learning. Once Bi- Bi-RNN has advantages in dealing with small data samples. So, Bi-RNN
LSTM network training is completed, it only needs to update network combining with 6 Bi-LSTM units, and a full-connected layer with 4
parameters with new data, without having to rebuild the model. In nodes, was constructed to build a basic LSTM network block (Zhang
recent years, researchers proposed improved structure of LSTM unit, et al., 2017). The former can capture the temporal relationship between
i.e., Gated Recurrent Unit (GRU) (Chung et al., 2014), making it more different features and the occurrences of pests and diseases. Since the
applicable and more efficient in prediction performance and training output of the LSTM layer is a vector, then a full-connected layer was
time. adopted to make a better abstraction and a combination of output
vectors. The latter also reduces the dimensionality of output and then
2.8.1. LSTM unit maps the reduced vector to a final prediction. In addition, ”Dropout”
To capture potential relationship between the time series data of technique was introduced into Bi-LSTM block to avoid over-fitting. The
climate factors and pest/disease values, LSTM was used in this work, final prediction can be defined as below:
where each LSTM unit contains three doors. For input x i , the input gate
(hi , Ci ) = LSTM ([hi − 1, hi + 1, x i], Ci − 1, Ci + 1, W ), (4)
decides the input entering into current cell, it = σ (W i × [ht − 1, x t ] + bi ) ;
the forget gate decides if and how much information can be forgotten prediction = softmax (W fc × yl + b fc ), (5)
for the previous memory, ft = σ (W f × [ht − 1, x t ] + b f ) ; and the output
one controls the information outputting from current cell, where (hi , Ci ) stands for the output of the i-th cell of Bi-LSTM; softmax
ot = σ (W o × [ht − 1, x t ] + bo) . The gating operation ultimately de- (*) is softmax function; hl is the hidden vector in the last time step of Bi-
termines which information is forgot and which information is entered LSTM layer; W fc and b fc are the weight matrix and bias term in full-
into the neural network as useful information. The sigmoid function can connection layer, respectively; prediction = {0, 1, 2, 3} , after one-hot
1
be expressed as: σ (x ) = 1 + e−x . For the climate-pest/disease forecasting encoding, represents the classification result of Bi-LSTM network.
issue, it processes a series of temporal dependency inputs x t at time t
and the hidden vector ht − 1 from the last time, therefore the memory 2.8.4. Architecture of the Bi-LSTM Network
vector of the LSTM cell can be iterated as Ct = ft × Ct − 1 + it × ct , where The occurrence prediction of cotton pests and diseases is regarded
ct = tanh (W c × [ht − 1, x t ] + bc ) is a new short-term state vector at time t. as a time series problem, which uses the historical climate data and pest
sinh (x ) e x − e−x counting values or percentages of disease area to identify whether pests
The tanh function can be expressed as: tanh (x ) = cosh (x ) = e x + e−x . As a
result, the output vector of the cell can be further predicted (Hochreiter and diseases will occur in the future. We should determine the length of
and Schmidhuber, 1997) as: historical observations used for the occurrence prediction. Of course the
longer the historical data is, the better the prediction will be, however
ht = ot × tanh (Ct ), (2) the more computation the prediction will need. Here the “timesteps” is
where W is the recurrent weight matrix; b is the corresponding bias set as 4, i.e., four samples of records are inputting together into Bi-
vector; the superscripts of i, f and o are the outputs of the input, forget, LSTM. In addition, three parameters for the whole structure of the
and output gates, respectively; and C and h are the memory vector and network should be determined: the layer number of Bi-LSTM lr , the
out vector of the cell, respectively. number of full-connected layers lfc and the corresponding number of
hidden units denoted by units_r.
2.8.2. Bi-LSTM network In order to train the network, some critical parameters have to be
A Bi-LSTM consists of two LSTM units that run in parallel, one on determined, such as optimization method, learning rate, batch size and
the input sequence and the other on the reverse of the input sequence. so on. The traditional optimization method for deep neutral network is
At each time step, the hidden state of Bi-LSTM is the concatenation of stochastic gradient descent (SGD) (Ruder, 2016), which is the batch
the forward hidden state (ht − 1, Ct − 1) and the backward hidden state version of gradient descent. The details of gradient descent and the
(ht + 1, Ct + 1) , which allows the hidden state to capture both past and parameters of network can be seen in below:
future information. Bi-LSTM network was designed to capture in- dft (θ )
gt =
formation of sequential dataset and maintain climate-pest/disease fea- dθ
tures from past and future. θt = θt − 1 − η × gt , (6)
As described in Schuster and Paliwal (1997) and Kalchbrenner et al.
where ft (θ) is the objective function used in the Bi-LSTM network; η is
(2015), the output of Bi-LSTM, (ht , Ct ), can be represented as a whole
the learning rate; θ is the parameter vector of network.
function LSTM (*):
Here, categorical-crossentropy was adopted as the loss function of
(ht , Ct ) = LSTM ([ht − 1, ht + 1, x i], Ct − 1, Ct + 1, W ), (3) the binary classification, whose definition is shown in below,
n c
where W concatenates the four weight matrices W i, Wf, Wo and W c.
ft (θ) = − ∑ ∑ (yitrue
,t × log (yi,prediction
t )),
i=1 t=1 (7)
2.8.3. Bi-LSTM based classifier
Fig. 1 illustrates the structure of the proposed networks. Supposed where n is the number of samples; c is the number of categories; is yitrue
,t
(X , Y ) be data input, where X denotes the records of 38 feature vectors the actual value; yi,prediction
t is the prediction value of the network, which
(12 climate features, 25 circulation parameters and historical value for is calculated by Eq. (5).
cotton pests and diseases), and Y : {0, 1, 2, 3} denotes pest/disease ha- However, SGD has many disadvantages in real training process. For
zard level. Feature selection and Feature combination for feature vec- example, it uses the same learning rate for all parameter updates and it
tors were developed to select proper features to predict the pest/disease is difficult to find the best learning rate for non-stationary objectives
level. Normalization and Standardization were used to unify different and different features. Sometimes, it falls into a local optimal solution.
4
P. Chen, et al. Computers and Electronics in Agriculture 176 (2020) 105612
To address these issues, Adam optimization method (Kingma and Ba, Here, we set “average” as “weighted”.
2014) was adopted that combines the advantages of two recently In addition, Receiver Operating Characteristic (ROC) curve was
popular methods: AdaGrad (Duchi et al., 2011), which works well with introduced and the area under the ROC curve (AUC) can be used to
sparse gradients, and RMSProp (Tieleman and Hinton, 2012), which evaluate a classifier. The ROC curve is drawn by True Positive Rate
works well in on-line and non-stationary settings. Adam calculates an (TPR) with respect to False Positive Rate (FPR). The closer the ROC
independent adaptive learning rate for different parameters by the first curve is to the upper left corner, the better the classifier performs.
and second moment estimates of the gradients mt and vt . For the Bi-
TP
LSTM-based network, at timestep t, it returns the best parameters θt as TPR = TP + FN
below: FRP =
FP
FP + TN (10)
mt = β1 × mt − 1 + (1 − β1) × gt mvt = vt − 1 × β2 + (1 − β2) × gt
mt vt mt̂
mt ̂ = , vt ̂ = , θt = θt − 1 − η × ,
1 − β1t 1 − β2t vt ̂ + ξ (8) 3. Results
where m and v are the first moment estimate and second raw moment
estimate, respectively; β 1 and β 2 are corresponding exponential decay 3.1. Determination of parameters
rates.
This work used 15,343 records from the Crop Pest Decision Support
System. In order to test the accuracy and generalization ability of the
2.9. Implementation and performance measurement proposed network, 70% datasets were randomly selected to train the Bi-
LSTM-based network and determine the parameters of network. First,
Other traditional classification models, i.e., k-Nearest Neighbor let’s set lr = 1 for the layer number of Bi-LSTM, lfc = 1 for the layer
(KNN) and random forest, were also implemented for cotton pest oc- number of fully connected layers, unitsfc = 1 for the unit number of fully
currence prediction in comparison with our Bi-LSTM model. The ex- connected layers, and choose a proper unit number of Bi-LSTM, unitsr ,
periments ran under the environment of Intel (R) Core (TM) i7-4790 from (4, 5, 6, 7, 8). Table 5 shows the prediction comparison with
CPU @3.60 GHz (8CPUs), 8G RAM, Windows 10 64 bits operating different values of unitsr . The boldface one in the table represents the
system, programming with Python 3.6. The proposed network was best performance among unitsr , i.e., the largest ACC, AUC, F1-score and
implemented by TensorFlow 0.11 (Abadi et al., 2016), while KNN and AP for different unitsr . It can be seen from the results that the best
random forest were implemented by Scikit-learn (Pedregosa et al., performance occurs when unitsr = 6. So in the following experiments,
2011). we set unitsr as 6.
Here, the prediction of pests and diseases is a basic classification Then, the time series sequences were used to choose a proper value
problem. In this work, Accuracy (ACC) (Accuracy (trueness and preci- for lr from {1,2,3}, the other two parameters are set as unitsr = 6 and lfc
sion) of measurement methods and results, YYYY), Area Under the = 1. Table 6 shows the prediction comparison with different values of lr
Curve (AUC) (Hanley and McNeil, 1983), Average Precision (AP) and . The boldface one in the table represents the best performance. Results
F1-score are used to measure the effectiveness of prediction methods. showed that the best performance occurs when lr = 1. The reason may
For binary classification model outputs, there are only two types of be due to the increasing number of weights with increasing recurrent
results, positive and negative ones (denoted as P and N). Therefore Bi-LSTM layers, which resulted in that insufficient dataset can not be
bivariate model has four outcomes for the case predictions: true posi- fully train the larger amount of weights. Actually, experiences showed
tive (TP), true negative (TN), false positive (FP), and false negative that Bi-LSTM with more layers did not always performs good. Results in
(FN). this work also showed that more Bi-LSTM layers yields unstable results
The definitions of ACC, AP and F1-score are shown in below: more likely. Therefore, in the following experiments, lr is set as 1.
TP + TN Similarly, on the same datasets, a proper value was set for lfc from
Acc = P+N {1, 2, 3} and its units. Table 7 shows the prediction comparison with
TP
Precision = TP + FP
TP Table 5
Recall = TP + FN Prediction comparison with respect to unitsr .
AP = ∑ (Recalln − Recalln − 1) × Precisionn unitsr AUC ACC F1-score AP
n
2(Precision × Recall) 2TP
F 1 − score = = , 4 0.9520 0.8728 0.8725 0.8982
Precision + Recall P + P′ (9)
5 0.9521 0.8750 0.8765 0.8995
6 0.9543 0.8773 0.8773 0.9044
In multiclass classification task, the notions of precision, recall, and
7 0.9536 0.8753 0.8749 0.9016
F-measures can be applied to each label independently, which could 8 0.9512 0.8711 0.8706 0.8970
combine results across labels, specified by the “average” argument.
5
P. Chen, et al. Computers and Electronics in Agriculture 176 (2020) 105612
Table 6 Table 8
Prediction comparison with respect to lr . Prediction results with different features on our network.
Table 8 shows the detailed scores and the boldface items in the table
Table 7 represent the best performance.
Prediction comparison with respect to lfc .
1[4]∗ 0.9543 0.8773 0.8773 0.9044 To show the power of our proposed method, the prediction com-
2[6,4] 0.9528 0.8750 0.8748 0.8997
parison has been investigated with other classical machine learning
3[6,6,4] 0.9506 0.8691 0.8702 0.8962
methods, such as KNN (Pedregosa et al., 2011), Random Forest (Hanley
∗ and McNeil, 1983) and LSTM network. The parameters of these models
The numbers in the square brackets stand for the number of the hidden
units. are set as, for LSTM network and our Bi-LSTM-based network, the same
parameters of unitsr , lr and lfc were set as 6, 1 and 1, respectively; for
respect to the parameter of lfc . The boldface one in the table represents KNN, weights = ’distance’, nneighbors = 3, algorithm = ‘ball_tree’ and p =
the best performance. The model achieves the best performance when 2; for Random Forest, nestimators is set as 100.
lfc = 1. The reason is similar to that in the choose of lr , i.e., the model Fig. 3 and Table 9 show that our network obtain good performance
with more layers means that there are more weights to be trained and on the occurrence prediction of cotton pests and diseases, while Fig. 4
more computation it is required. So in the following experiments, we set shows the ROC Curve of our network on occurrence prediction of pests
lfc = 1 and the number of the hidden units are 4. The final full con- and diseases. The boldface items in the table represent the best per-
nectivity layer is integrated into the Bi-LSTM model to yield the pre- formance, i.e., the largest ACC, AUC, AP and F1-score. It can be seen
dictions of pests and diseases. from the results that the Bi-LSTM-based network achieves the best
After building the basic framework of the proposed Bi-LSTM net- prediction performance, LSTM are the second, Random Forest are the
work, the other parameters have to be adjusted to make the model third method, and KNN performs the worst. Moreover, the proposed Bi-
achieving higher performance, i.e., dropout = 0.1, batchsize = 32, LSTM method achieves an AUC of 0.95 and an AP of 0.90, whileas it is
learningrate = 0.001. The structure of our Bi-LSTM network is shown in difficult to achieve such high performance with traditional machine
Fig. 1. Compared with classical machine learning methods, one ad- learning methods. From the results, LSTM and Bi-LSTM perform simi-
vantage of the deep learning model is that it can directly update net- larly. Although the results are based on the small dataset with 15,343
work parameters for new data of the same type, without having to re- records, Bi-LSTM performs better than LSTM for 10 insect pests and
peat feature selection and build networks (LeCun et al., 2015). Bi-LSTM diseases cross 6 important locations in India.
not only can update the network parameters in real time according to
the current input data and can be applied to predict the occurrences of 3.3. Prediction comparison with other methods
other kinds of pests, but also have advantages in dealing with small data
samples compared with traditional neural networks. Moreover, the prediction performance of our model on different
Although deep learning models do not require the cumbersome and types of data was investigated. According to Crop Pest Decision Support
time-consuming feature selection generally, adequate feature inputs System, the data of 6 areas in India and 10 different types of cotton
associated with prediction targets still result in relatively high perfor- pests and diseases are shown in Table 1 and Table 2, respectively. Our
mance. Fig. 2 shows the prediction comparison of the model with only 9 model performed on these two types of datasets separately and the
features (8 climate features and 1 for pest value) and all 38 features (12 results are shown in Table 10 and Table 11. Among them, because the
climate features, 25 circulation parameters and 1 for pest value) on our Coimbatore area (208 records) datasets and the Mealybug/Miridbug
network. From the Fig. 2, the model with all 38 features still outper- insect pests (260 records) datasets are too small to obtain stable pre-
forms slightly than that with the 9 climate features on our network. diction performance, their predictions were ignored here. From
6
P. Chen, et al. Computers and Electronics in Agriculture 176 (2020) 105612
Table 9 Table 11
Performance comparison with other methods. Performance comparison on different types of pests and diseases with Bi-LSTM
network.
Methods AUC ACC F1-score AP
Datasets(size) AUC ACC F1-score AP
KNN 0.8332 0.7684 0.7809 0.7180
Random Forest 0.9359 0.8258 0.8478 0.7878 Aphid(1032) 0.9401 0.8298 0.8234 0.8649
LSTM 0.9520 0.8701 0.8698 0.9017 Bollworm(7083) 0.9461 0.8675 0.8668 0.8941
Our Bi-LSTM method 0.9545 0.8784 0.8784 0.9059 Jassid(1974) 0.9661 0.8873 0.8871 0.9197
Spodoptera(630) 0.9225 0.8437 0.8291 0.8733
Thrips(832) 0.9548 0.8353 0.8326 0.8789
Whitefly(1508) 0.9404 0.8565 0.8473 0.8882
LeafBlight/LeafSpot(1924) 0.9669 0.9172 0.9095 0.9447
Table 10 and Table 11, results showed that our model achieved good
performance both on the datasets in different regions and on those of
different types of insect pests and diseases.
From Table 10, the model tested on pests and diseases of most areas
achieves an AUC more than 0.95, except for Akola area. It mights be-
cause there are more complex reasons to affect the occurrence of pests
and diseases on Akola. Compared with the prediction results in
Table 10, the differences in Table 11 are more significant and the re-
sults are also easy to accept. There are large difference between the
occurrence of different types of insect pests and that of crop diseases.
From the results, our model shows the best performance in predicting
the occurrence of cotton LeafBlight disease and LeafSpot disease, which
may indicate that there is a large correlation between climate change
and the occurrence of cotton disease.
Fig. 4. ROC curves of four pest level classes with Bi-LSTM network. Here “area”
means the area under each ROC curve.
3.4. Occurrence forcasting of pests for different future times
7
P. Chen, et al. Computers and Electronics in Agriculture 176 (2020) 105612
Table 12 would like to build datasets with more factor features, including climate
Occurrence forcasting of pests for different future times with our Bi-LSTM- factors, the occurrence cycle of pests and diseases and so on. On the
based network. other side, we would try to construct more effective model to predict
Metrics 1 week 2 weeks 1 month the hazard level of pests and diseases so that prediction results are more
responsive to data, making it easier for people to develop detailed pest
AUC 0.9530 0.9312 0.8979 control strategies.
ACC 0.8746 0.8332 0.7935
F1-score 0.8745 0.8264 0.7742
AP 0.9006 0.8699 0.8292 CRediT authorship contribution statement
8
P. Chen, et al. Computers and Electronics in Agriculture 176 (2020) 105612
Genuer, R., Poggi, J.-M., Tuleau-Malot, C., 2010. Variable selection using random forests. papers/v28/pascanu13.html.
Pattern Recogn. Lett. 31, 2225–2236. https://doi.org/10.1016/j.patrec.2010.03.014. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
Graves, A., 2013. Generating sequences with recurrent neutral networks, CoRR abs/1506. Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D.,
02078. arXiv:1506.02078. http://arxiv.org/abs/1506.02078. Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: Machine learning in py-
Hang, J., Zhang, D., Chen, P., Zhang, J., Wang, B., 2019. Classification of plant leaf thon. J. Mach. Learn. Res. 12, 2825–2830 http://dl.acm.org/cita-
diseases based on improved convolutional neural network. Sensors 19 (19), 4161. tion.cfm?id=2078195.
https://doi.org/10.3390/s19194161. Prasetyo, S.Y.J., Agus, Y.H., Dewi, C., Simanjuntak, B.H., Hartomo, K.D., 2017. Geodata:
Hanley, J.A., McNeil, B.J., 1983. A method of comparing the areas under receiver op- Information system based on geospatial for early warning tracking and analysis
erating characteristic curves derived from the same cases. Radiology 148, 839–843. agricultural plant diseases in central java 180. 012070. doi:10.1088/1757-899x/
https://doi.org/10.1148/radiology.148.3.6878708. 180/1/012070.
Hochreiter, S., 1991. Untersuchungen zu dynamischen neuronalen netzen, Master’s Ruder, S., 2016. An overview of gradient descent optimization algorithms, CoRR abs/
thesis, Institut fur Informatik. Technische Universitat, Munchen. 1609.04747. arXiv:1609.04747. http://arxiv.org/abs/1609.04747.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8), Schuster, M., Paliwal, K.K., 1997. Bidirectional recurrent neural networks. IEEE Trans.
1735–1780. Signal Process. 45 (11), 2673–2681. https://doi.org/10.1109/78.650093.
Jiang, Z., Zhao, C., He, B., Guan, Y., Jiang, J., 2017. De-identification of medical records Singh, S., Gupta, M., Pandher, S., Kaur, G., Rathore, P., Palli, S.R., 2018. Selection of
using conditional random fields and long short-term memory networks. J. Biomed. housekeeping genes and demonstration of rnai in cotton leafhopper, amrasca bi-
Inform. 75S, S43–S53. https://doi.org/10.1016/j.jbi.2017.10.003. guttula biguttula (ishida). Amrasca biguttula biguttula 13 (1), e0191116. https://doi.
Kalchbrenner, N., Danihelka, I., Graves, A., 2015. Grid long short-term memory, CoRR org/10.1371/journal.pone.0191116.
abs/1507.01526. arXiv:1507.01526. http://arxiv.org/abs/1507.01526. Sutskever, I., Vinyals, O., Le, Q.V., 2014. Sequence to sequence learning with neural
Karpathy, A., Johnson, J., Li, F., 2015. Visualizing and understanding recurrent networks, networks. In: In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D.,
CoRR abs/1506.02078. arXiv:1506.02078. http://arxiv.org/abs/1506.02078. Weinberger, K.Q. (Eds.), Advances in Neural Information Processing Systems 27:
Kelly, H.Y., Dufault, N.S., Walker, D.R., Isard, S.A., Schneider, R.W., Giesler, L.J., Wright, Annual Conference on Neural Information Processing Systems 2014, December 8–13
D.L., Marois, J.J., Hartman, G.L., 2015. From select agent to an established pathogen: 2014, Montreal, Quebec, Canadapp. 3104–3112 http://papers.nips.cc/paper/5346-
The response to phakopsora pachyrhizi (soybean rust) in north america 105. pp. sequence-to-sequence-learning-with-neural-networks.
905–916. doi:10.1094/phyto-02-15-0054-fi. Tieleman, T., Hinton, G., 2012. Neural networks for machine learning, Tech. rep.
Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization, CoRR abs/1412. Wu, K., Lu, Y., Wang, Z., 2009. Advance in integrated pest management of crops in china.
6980. arXiv:1412.6980. http://arxiv.org/abs/1412.6980. Chinese Bull. Entomol. 46 (6), 831–836.
LeCun, Y., Bengio, Y., Hinton, G.E., 2015. Deep learning. Nature 521 (7553), 436–444. Xie, C., Wang, R., Zhang, J., Chen, P., Dong, W., Li, R., Chen, T., Chen, H., 2018. Multi-
https://doi.org/10.1038/nature14539. level learning features for automatic classification of field crop pests. Computers
Li, W., Chen, P., Wang, B., Xie, C., 2019. Automatic localization and count of agricultural Electron. Agric. 152, 233–241. https://doi.org/10.1016/j.compag.2018.07.014.
crop pests based on an improved deep learning pipeline 9. doi:10.1038/s41598-019- Xie, J., Liu, X., Dajun Zeng, D., 2018. Mining e-cigarette adverse events in social media
43171-0. using bi-lstm recurrent neural network with word embedding representation. J. Am.
Luo, J., Zhang, S., Ren, X., 2017. Research progress of cotton insect pests in china in Med. Inform. Assoc.: JAMIA 25, 72–80. https://doi.org/10.1093/jamia/ocx045.
recent ten years. Cotton Sci. 19 (5), 385–390. Zhang, W., zhong Jing, T., Yan, S., 2017. Studies on prediction models of dendrolimus
Miao, Y., Gowayyed, M., Metze, F., EESEN: end-to-end speech recognition using deep superans occurrence area based on machine learning. J. Beijing Forestry Univ. 39 (1),
RNN models and wfst-based decoding. In: 2015 IEEE Workshop on Automatic Speech 85–93.
Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, Zhang, Q., Wang, H., Dong, J., Zhong, G., Sun, X., 2017. Prediction of sea surface tem-
2015, IEEE, 2015, pp. 167–174. doi:10.1109/ASRU.2015.7404790. perature using long short-term memory. IEEE Geosci. Remote Sensing Lett. 14 (10),
Mikolov, T.A., 2012. Statistical language models based on neural networks. Ph.D. thesis. 1745–1749. https://doi.org/10.1109/LGRS.2017.2733548.
Brno University of Technology. Zhao, R., Yan, R., Wang, J., Mao, K., 2017. Learning to monitor machine health with
Pascanu, R., Mikolov, T., Bengio, Y., 2013. On the difficulty of training recurrent neural convolutional bi-directional lstm networks. Sensors (Basel, Switzerland) 17. https://
networks. In: Proceedings of the 30th International Conference on Machine Learning, doi.org/10.3390/s17020273.
ICML 2013, Atlanta, GA, USA, 16–21 June 2013, Vol. 28 of JMLR Workshop and Zhou, Y., Gao, P., 2014. Design and implementation of ensemble forecast system for crop
Conference Proceedings, JMLR.org, pp. 1310–1318. http://jmlr.org/proceedings/ diseases and pests. J. Computer Appl. 34 (S1), 141–144.