Plant Disease Recognition A Large-Scale Benchmark Dataset and A Visual Region and Loss Reweighting Approach
Abstract— Plant disease diagnosis is very critical for agricul- Index Terms— Plant disease recognition, fine-grained visual
ture due to its importance for increasing crop production. Recent classification, reweighting approach, feature aggregation.
advances in image processing offer us a new way to solve this issue
via visual plant disease analysis. However, there are few works
in this area, not to mention systematic researches. In this paper, I. I NTRODUCTION
we systematically investigate the problem of visual plant disease
recognition for plant disease diagnosis. Compared with other
types of images, plant disease images generally exhibit randomly
distributed lesions, diverse symptoms and complex backgrounds,
P LANT diseases cause severe threats to global food secu-
rity by reducing crop production all over the world.
According to the statistics, about 20%-40% of all crop losses
and thus are hard to capture discriminative information. To facil-
itate the plant disease recognition research, we construct a globally are due to plant diseases [1]. Therefore, plant disease
new large-scale plant disease dataset with 271 plant disease diagnosis is critical to the prevention of spread of plant
categories and 220,592 images. Based on this dataset, we tackle diseases and reduction of economic losses in agriculture.
plant disease recognition via reweighting both visual regions Most of the plant disease diagnosis methods heavily rely on
and loss to emphasize diseased parts. We first compute the either the molecular assay or plant protector’s observation.
weights of all the divided patches from each image based on the
cluster distribution of these patches to indicate the discriminative However, the former is complicated and constrained to cen-
level of each patch. Then we allocate the weight to each loss tralized labs while the latter is time-consuming and prone to
for each patch-label pair during weakly-supervised training to errors. Currently, image-based technologies are being widely
enable discriminative disease part learning. We finally extract applied to various interdisciplinary tasks via deciphering visual
patch features from the network trained with loss reweighting, content, e.g., medical imaging [2], food computing [3] and
and utilize the LSTM network to encode the weighed patch
feature sequence into a comprehensive feature representation. cellular image analysis [4]. Benefitting from recent advances
Extensive evaluations on this dataset and another public dataset in machine learning, especially deep learning [5], we assert
demonstrate the advantage of the proposed method. We expect that plant image analysis and recognition can also provide a
this research will further the agenda of plant disease recognition new way for plant disease diagnosis. Meanwhile, the applica-
in the community of image processing. tions in visual plant disease diagnosis conversely promote the
development of image processing technologies.
Manuscript received March 11, 2020; revised August 28, 2020 and
October 10, 2020; accepted December 28, 2020. Date of publication field have begun to develop, such as aerial phenotyping [6]
January 14, 2021; date of current version January 22, 2021. This work
was supported in part by the National Natural Science Foundation of China and fingerprinting of leaves [1]. However, these methods
under Project 61932003 and Project 61772051; in part by the National Key heavily rely on either expensive devices or complex molecular
Research and Development Plan under Grant 2019YFC1521102; in part technology, and thus are not easily popularized. Recently,
by the Beijing Natural Science Foundation under Grant L182016; in part
by the Beijing Program for International S&T Cooperation Project under some works [7]–[12] adopt deep learning methods for plant
Grant Z191100001619003; and in part by the Shenzhen Research Institute disease recognition. However, most of them directly extract
of Big Data (Shenzhen). The associate editor coordinating the review of deep features from plant disease images without consider-
this manuscript and approving it for publication was Prof. Guo-Jun Qi.
(Corresponding author: Lili Wang.)
Xinda Liu and Lili Wang are with the State Key Laboratory of Virtual restricted to small datasets with fewer categories and simple
Reality Technology and Systems, Beijing Advanced Innovation Center for visual backgrounds.
Biomedical Engineering, Beihang University, Beijing 100191, China, and
also with the Peng Cheng Laboratory, Shenzhen 518066, China (e-mail: According to our survey, there are mainly three distinctive
[email protected]; [email protected]). characteristics for plant disease images taken in real-world
Weiqing Min and Shuqiang Jiang are with the Key Laboratory of Intelligent scenarios. (1) Randomly distributed lesions. The foliar
Information Processing, Institute of Computing Technology, Chinese Acad-
emy of Sciences, Beijing 100190, China, and also with the University lesions probably randomly occur in the plant leaves. As shown
of Chinese Academy of Sciences, Beijing 100049, China (e-mail: min- in Fig. 1 (a), the cherry fungal shot hole disease is distributed
[email protected]; [email protected]). in many different parts of the leaf, including the top, left and
Shuhuan Mei is with Beijing Puhui Sannong Technology Company Ltd.,
Beijing 100190, China (e-mail: [email protected]). right positions. Because deep convolutional neural networks
Digital Object Identifier 10.1109/TIP.2021.3049334 trained with image level labels only tend to focus on the
Fig. 2. Disease leaf image samples from various categories of PDD271 (one samples per category). The dataset contains three macro-classes: Fruit Tree,
Vegetable, and Field Crops.
than 75, 000 images belonging to 102 categories for insect recognition methods without high-cost object part locations or
pest recognition. In contrast, PDD271 aims at advancing plant attribute annotations. For example, Yang et al. [32] initialized
disease recognition. We believe that PDD271 and IP102 are many anchors randomly and extracted their features as
very complementary and can jointly promote the development their informativeness using the RPN method, and finally
of intelligent agriculture analysis and understanding in the chose the informative region to improve the classification
image processing and computer vision community. performance. There are also several attention-based methods
proposed for Fine-Grained Visual Classification. For example,
Hu et al. [33] used attention maps to guide the data
B. Fine-Grained Visual Classification augmentation, Peng et al. [34] proposed the object-part
Fine-grained image recognition aims to distinguish attention model to select discriminative regions subjecting to
sub-ordinate categories, such as birds and food. In the early the object-part spatial constraint, and SeNet154 [35] enhance
stage, researchers [25], [26] based on deep learning first the recognition performance with spatial-channel attention.
used strong supervised mechanisms with part bounding box However, attention-based methods probably focus on the
annotations to learn to attend on discriminative parts. Recent most discriminative parts while missing other parts for the
researches [3], [13], [27]–[32] focused on weakly-supervised whole image.
The whole data construction takes about 2 years. The result- regions and the loss during the weakly-supervised learning to
ing PDD271 contains 220,592 images and 271 categories. emphasize discriminative diseased parts for the purpose of the
As shown in Fig. 3, the minimum number of images per plant disease recognition. As shown in Fig. 6, this framework
category is over 400 and the maximum one is 2000. The mainly consists of three stages, namely Cluster-based Region
balanced distribution ensures the stability of model train- Reweighting (CRR), Training with Loss Reweighting (TLR)
ing. A reliable dataset plays an essential role in developing and Weighted Feature Integration (WFI). CRR takes all the
image processing technologies in a specific area. For example, divided patches from plant disease images as input and sets
HiEve [38] is vital to human-centric analysis, so as ATRW [39] the weight of each patch according to the cluster distribution
to wildlife conservation. Likewise, the proposed dataset of the visual features of these patches. For each patch-label
PDD271 offers a large coverage and diversity of plant dis- pair, TLR allocates the corresponding weight to each loss
eases. It will further the plant disease recognition agenda and during weakly-supervised training in order to enable the
expand the image processing techniques into the agricultural discriminative disease part learning. Based on extracted patch
area. features from TLR and corresponding weights from CRR,
WFI utilizes the LSTM network to encode the weighed patch
IV. F RAMEWORK feature sequence into a comprehensive feature representation.
In this section, we introduce the proposed framework which Section IV-A details CRR, Section IV-B introduces TLR and
explores a multi-scale strategy and reweights both visual Section IV-C presents WFI.
A. Cluster-Based Region Reweighting Normally, the patches containing similar visual symptoms
are likely to be assigned to the same clusters. In case of small
Many diseases present small and scattered lesions, such
distance among clusters, the visual phenotypes of different
as pumpkin mildew, pear frog-eye leaf spot and actinidia
diseases are similar and hard to distinguish by the deep model.
brown spot. The deep convolutional neural networks trained
Therefore, these clusters are given higher weights to enhance
with image level labels often overlook these lesions while
their influence in follow-up feature learning and integration.
focusing on more salient parts. Considering these situations,
The size of cluster is also an important indicator. There is
we explore a multi-scale strategy by dividing the images into
a highly skewed distribution of different disease patches. For
non-overlapping patches and enlarging every patch to avoid
example, the number of non-diseased patches containing com-
missing diseased patches. However, the disease-independent
plex backgrounds and foliar healthy parts is very large, but the
patches, such as the complex backgrounds and the healthy
number of patches containing cotton eye spot disease is small
parts, are enhanced even more in the above process, which
due to the concentrated symptom of this disease leading to
could lead to severe unbalance between the diseased patches
the poor classification performance. Meanwhile, the distance
and the irrelevant ones. To address this problem, we attempt
between two clusters indicates their visual difference. If one
to use the visual similarity among the same disease to cluster
cluster is far from the other clusters, we can easily obtain
the patches of the same disease. Afterwards, we reweight
discriminative features for this cluster, thus assign a small
the patches based on the clustering result and indicate the
weight to it. Hence we assign these clusters suitable weights
discriminative level of each patch.
to make their influences as balanced as possible.
Formally, all patches from all the original training images
Given all these, we assign the cluster weights according
form a new training set. Let X ∈ Rm×N denotes the visual
to the following rule: the larger size the cluster and the
features of these patches, where m is the dimension of the
farther away from the others, the smaller its weight. We use
visual feature and N is the number of training patches. We then
a monotone decreasing function F = e x/(x−1) to model this
have these patches clustered into k cluster classes c with
change. According to the size of the cluster and the distance
their centroids being {μ1 , μ2 , . . . , μk } ∈ Rm . To compute the
distribution among the cluster centroids, we compute the
weight wx , x ∈ X, the weights of the clusters wc and the
weights of the cluster ci as follows,
probability distribution px of x belonging to over all clusters
are computed. Then, wx is computed as
wci = F(Nci ) × F( d(μi , μ j )), (2)
wx = px · wc , (1) j =i, j ∈1,...,k
where wc = [wc1 , . . . , wci , . . . , wck ] and wci denotes the where Nci is the number of patches in cluster ci and d(μi , μ j )
weight of the cluster ci . is the distance between the centroid μi and μ j .
Fig. 7. The result of elbow method. The blue line shows that the SSE changes P ERFORMANCE C OMPARISON FOR D IFFERENT T RAINING M ETHODS
with the K , and the orange line is the MA line.
Fig. 10. Qualitative results. From top to bottom, (a) the original image with annotating diseased parts by red boxes, (b) the feature map from the last
convolution layer of VGG16, (c) the feature map from the last convolution layer of ResNet152, (d) the feature map from the last convolution layer of
SeNet154, (e) visualisation of the proposed CRR weights for each patch. The red means high weights and the blue means relatively low weights. For the best
view, we only visualize the weights which are bigger than 0.75. CRR can consider more regions and obtain more characteristics.
TABLE VII Compared with feature maps from typical deep networks,
I MPACT OF O RDERS . T HE ‘ T ’, ‘ B ’, ‘ L’, AND ‘ R ’ D ENOTE THE T OP, we can find that the proposed reweighted maps can cover more
discriminative regions. The VGG16 and ResNet152 probably
TO B OTTOM AND L EFT TO R IGHT. T HE ‘ RD ’ D ENOTES THE focus on disease-irrelevant regions, and meanwhile ignores
R ANDOM O RDER . T HE ‘ FIXED ’ D ENOTES T HAT THE O RDER some useful information. Our approach can pay attention
to multiple scattered regions, which is more appropriate for
PATCH L IST FOR E ACH I MAGE I S U NFIXED plant disease recognition. The visualization results of the
PDD271 further demonstrate the effectiveness of the proposed
cluster-based reweighting strategy.
In addition, we further show the confusion matrix of our
method on the PDD271 in Fig. 11, where the vertical axis
shows the ground-truth classes and the horizontal axis shows
the predicted classes. Yellower colors indicate better perfor-
mance. We can see that our method still does not provide per-
fect performance for some plant disease categories. We enlarge
specific regions to highlight the misclassified results and show
some samples from confused categories. We can see that these
plant disease categories are very similar in visual appearance
is diseased no matter where the lesions appear in. Another and texture. Even the humans do not easily distinguish among
possible explanation is that the uncertain order is likely to these disease categories. The probable solution is to design
enhance the power of networks. more fine-grained visual feature learning methods or use
8) Visualization: We visualize different emphasized parts multi-source information from different sensors to classify
in different methods via gradient-weighted class activation these plant disease categories.
heatmap [49]. Fig. 10 shows the visualization results of some
typical deep architectures, such as VGG16 and ResNet152.
The reweighted maps of the proposed cluster-based region C. Experiment on PlantVillage Dataset
reweighting strategy are shown in Fig. 10 (d), where we only Besides the PDD271, we also conduct the evalua-
visualize the weight of the patch x when wx ≥ 0.75. tion on another publicly available benchmark datasets, the
