Plant Disease Recognition A Large-Scale Benchmark Dataset and A Visual Region and Loss Reweighting Approach

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL.
30, 2021 2003
Plant Disease Recognition: A Large-Scale

Benchmark Dataset and a Visual Region
and Loss Reweighting Approach
Xinda Liu, Weiqing Min , Member, IEEE, Shuhuan Mei, Lili Wang , Member, IEEE,
and Shuqiang Jiang , Senior Member, IEEE
Abstract— Plant disease diagnosis is very critical for agricul- Index Terms— Plant disease recognition, fine-grained visual
ture due to its importance for increasing crop production. Recent classification, reweighting approach, feature aggregation.
advances in image processing offer us a new way to solve this issue
via visual plant disease analysis. However, there are few works
in this area, not to mention systematic researches. In this paper, I. I NTRODUCTION
we systematically investigate the problem of visual plant disease
recognition for plant disease diagnosis. Compared with other
types of images, plant disease images generally exhibit randomly
distributed lesions, diverse symptoms and complex backgrounds,
P LANT diseases cause severe threats to global food secu-
rity by reducing crop production all over the world.
According to the statistics, about 20%-40% of all crop losses
and thus are hard to capture discriminative information. To facil-
itate the plant disease recognition research, we construct a globally are due to plant diseases [1]. Therefore, plant disease
new large-scale plant disease dataset with 271 plant disease diagnosis is critical to the prevention of spread of plant
categories and 220,592 images. Based on this dataset, we tackle diseases and reduction of economic losses in agriculture.
plant disease recognition via reweighting both visual regions Most of the plant disease diagnosis methods heavily rely on
and loss to emphasize diseased parts. We first compute the either the molecular assay or plant protector’s observation.
weights of all the divided patches from each image based on the
cluster distribution of these patches to indicate the discriminative However, the former is complicated and constrained to cen-
level of each patch. Then we allocate the weight to each loss tralized labs while the latter is time-consuming and prone to
for each patch-label pair during weakly-supervised training to errors. Currently, image-based technologies are being widely
enable discriminative disease part learning. We finally extract applied to various interdisciplinary tasks via deciphering visual
patch features from the network trained with loss reweighting, content, e.g., medical imaging [2], food computing [3] and
and utilize the LSTM network to encode the weighed patch
feature sequence into a comprehensive feature representation. cellular image analysis [4]. Benefitting from recent advances
Extensive evaluations on this dataset and another public dataset in machine learning, especially deep learning [5], we assert
demonstrate the advantage of the proposed method. We expect that plant image analysis and recognition can also provide a
this research will further the agenda of plant disease recognition new way for plant disease diagnosis. Meanwhile, the applica-
in the community of image processing. tions in visual plant disease diagnosis conversely promote the
development of image processing technologies.
Manuscript received March 11, 2020; revised August 28, 2020 and The research and exploration on plant image analysis in this
October 10, 2020; accepted December 28, 2020. Date of publication field have begun to develop, such as aerial phenotyping [6]
January 14, 2021; date of current version January 22, 2021. This work
was supported in part by the National Natural Science Foundation of China and fingerprinting of leaves [1]. However, these methods
under Project 61932003 and Project 61772051; in part by the National Key heavily rely on either expensive devices or complex molecular
Research and Development Plan under Grant 2019YFC1521102; in part technology, and thus are not easily popularized. Recently,
by the Beijing Natural Science Foundation under Grant L182016; in part
by the Beijing Program for International S&T Cooperation Project under some works [7]–[12] adopt deep learning methods for plant
Grant Z191100001619003; and in part by the Shenzhen Research Institute disease recognition. However, most of them directly extract
of Big Data (Shenzhen). The associate editor coordinating the review of deep features from plant disease images without consider-
this manuscript and approving it for publication was Prof. Guo-Jun Qi.
(Corresponding author: Lili Wang.) ing characteristics of the task. In addition, these works are
Xinda Liu and Lili Wang are with the State Key Laboratory of Virtual restricted to small datasets with fewer categories and simple
Reality Technology and Systems, Beijing Advanced Innovation Center for visual backgrounds.
Biomedical Engineering, Beihang University, Beijing 100191, China, and
also with the Peng Cheng Laboratory, Shenzhen 518066, China (e-mail: According to our survey, there are mainly three distinctive
[email protected]; [email protected]). characteristics for plant disease images taken in real-world
Weiqing Min and Shuqiang Jiang are with the Key Laboratory of Intelligent scenarios. (1) Randomly distributed lesions. The foliar
Information Processing, Institute of Computing Technology, Chinese Acad-
emy of Sciences, Beijing 100190, China, and also with the University lesions probably randomly occur in the plant leaves. As shown
of Chinese Academy of Sciences, Beijing 100049, China (e-mail: min- in Fig. 1 (a), the cherry fungal shot hole disease is distributed
[email protected]; [email protected]). in many different parts of the leaf, including the top, left and
Shuhuan Mei is with Beijing Puhui Sannong Technology Company Ltd.,
Beijing 100190, China (e-mail: [email protected]). right positions. Because deep convolutional neural networks
Digital Object Identifier 10.1109/TIP.2021.3049334 trained with image level labels only tend to focus on the
1941-0042 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Leeds. Downloaded on August 24,2021 at 22:40:20 UTC from IEEE Xplore. Restrictions apply.
2004 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021
In summary, we make the following main contributions:

(1) We conduct for the first time the systematical investiga-
tion and analysis of the problem of plant disease recognition
in the community of agricultural image processing.
(2) We propose a novel framework, which can explore
a multi-scale strategy and reweight both visual regions and
the loss to emphasize discriminative diseased parts for plant
disease recognition.
(3) We collect the largest labeled plant disease dataset
Fig. 1. Some plant disease samples with different characteristics: (a) Ran-
domly distributed lesions. (b) Diverse symptoms. (c) Complex backgrounds.
PDD271 with 271 plant disease categories and 220,592 images
The diseased parts are annotated by red boxes. to date and conduct extensive evaluations on newly proposed
PDD271, demonstrating the effectiveness of our method.
The rest of this paper is organized as follows. Section II
most discriminative parts while missing other object parts, reviews the related work. Section III introduces the con-
as claiming in [13], many lesions are easy to be neglected. struction of PDD271. Section IV elaborates the proposed
(2) Diverse symptoms. Even for the same plant disease, there plant disease recognition framework. Experimental results and
are probably various visual symptoms on the plant leaves at analysis are reported in Section V. Finally, we conclude the
different time periods. As shown in Fig. 1 (b), due to different paper and give the future work in Section VI.
degrees of infection, the allium fistulosum black spot shows
different symptoms in different leaves of the same plant. The II. R ELATED W ORKS
middle leaf is severely infected while the others appear to be
mild. The appearances vary considerably in different infected A. Plant Disease Recognition
stages, leading to large intra-class variations. (3) Complex Plant disease diagnosis is critical to the prevention of
backgrounds. There usually exist various background clutters spread of crop diseases and reduction of economic losses
in real-world scenarios. As shown in Fig. 1 (c), there are dense in agriculture [1]. Most of traditional methods rely on
leaves and any other irrelevant objects in the background. the molecular technologies [14], [15] that are complicated,
In contrast, the disease symptoms are not salient, making plant time-consuming and constrained to centralized laboratories.
disease recognition more difficult. Therefore, some works adopt traditional computer vision
To advance the plant disease recognition research in agri- methods for plant disease recognition, such as hyperspectral
cultural image processing, we collect a large-scale plant image analysis [16], artificial bee colony algorithm [17], and
disease dataset Plant Disease Dataset 271 (PDD271) with image segmentation [18]. Recently, there have been more
220,592 plant leaf images belonging to 271 plant disease attempts to utilize deep learning in plant disease recog-
categories. To the best of our knowledge, this dataset is nition for its powerful capability of discriminative feature
the first large-scale plant disease dataset that is meaningful learning [7]–[10]. For example, Wang et al. [10] finetuned
for image processing research in the agricultural field. Some the VGG, ResNet50, and GoogleNet directly on the leaf
image samples are shown in Fig. 2 while Fig. 3 shows the disease set and Ferentinos et al. [9] finetuned the AlexNet and
number of images per category, sorted in decreasing order in GoogleNet directly to identify 14 crop species and 26 diseases.
different categories. Pie chart in Fig. 3 indicates the overall However, most of them directly extract deep features without
balance among the three macro-classes, no matter in terms of considering the characteristics of the plant disease image.
category number or image number per category. All the images Besides, most of the works conduct their evaluations on
are taken in real-world scenarios with different conditions. small-scale datasets. Table I summarizes the most common
Taking the characteristics of the plant disease image into plant disease and crop pest datasets. We can see that PlantVil-
consideration, we tackle visual plant disease recognition lage Dataset [21] is the largest plant disease dataset, but only
via reweighting both visual regions and loss. Particularly, contains 38 plant disease categories. In addition, the images
considering randomly distributed lesions, we explore the from this dataset are taken on the table, and not in the
multi-scale strategy by dividing the plant disease images into real-world scenarios. We show some samples of these leaf
non-overlapping patches, and compute the weights of these datasets in Fig. 4.
patches according to the cluster distribution of these patches in Different from these works, we systematically analyze the
order to indicate the discriminative level of each patch. By set- problem of plant disease recognition and propose a multi-scale
ting different weights to different patches, we enhance the method to reweight visual regions and the loss to emphasize
influence of patches with diseased symptoms and reduce discriminative diseased parts for plant disease recognition
the interference of irrelevant patches. We further allocate based on the characteristics of the plant disease image. Further-
the weight to each loss for each patch-label pair during more, we collect a large-scale plant disease dataset PDD271,
weakly-supervised training for diseased parts learning. Finally, which not only has the advantage in data volume and category
we extract patch features from the network trained with coverage, but also is collected in real-world scenarios with
loss reweighting and adopt a LSTM network to encode the complex background (as shown in Table I and Fig. 4). In
weighted patch feature sequence into a comprehensive feature particular, there is another agricultural dataset IP102 [24],
representation. which is relevant to crop pest. This dataset contains more
LIU et al.: PLANT DISEASE RECOGNITION: A LARGE-SCALE BENCHMARK DATASET 2005
TABLE I
S TATISTICS ON E XISTING P LANT D ISEASE D ATASETS
Fig. 2. Disease leaf image samples from various categories of PDD271 (one samples per category). The dataset contains three macro-classes: Fruit Tree,
Vegetable, and Field Crops.
than 75, 000 images belonging to 102 categories for insect recognition methods without high-cost object part locations or
pest recognition. In contrast, PDD271 aims at advancing plant attribute annotations. For example, Yang et al. [32] initialized
disease recognition. We believe that PDD271 and IP102 are many anchors randomly and extracted their features as
very complementary and can jointly promote the development their informativeness using the RPN method, and finally
of intelligent agriculture analysis and understanding in the chose the informative region to improve the classification
image processing and computer vision community. performance. There are also several attention-based methods
proposed for Fine-Grained Visual Classification. For example,
Hu et al. [33] used attention maps to guide the data
B. Fine-Grained Visual Classification augmentation, Peng et al. [34] proposed the object-part
Fine-grained image recognition aims to distinguish attention model to select discriminative regions subjecting to
sub-ordinate categories, such as birds and food. In the early the object-part spatial constraint, and SeNet154 [35] enhance
stage, researchers [25], [26] based on deep learning first the recognition performance with spatial-channel attention.
used strong supervised mechanisms with part bounding box However, attention-based methods probably focus on the
annotations to learn to attend on discriminative parts. Recent most discriminative parts while missing other parts for the
researches [3], [13], [27]–[32] focused on weakly-supervised whole image.
recognition method via reweighting both visual regions and

the loss to emphasize the diseased parts for the plant disease
recognition. Our method is heuristic with many possibilities
to improve. For example, our method and attention-based
methods can work together in one framework to further
enhance the recognition performance, such as gate-attention
deep networks [36]. We can also conveniently combine our
framework with the spatial context through discriminative
spatio-appearance kernels [37] to promote the performance.
III. DATASET C ONSTRUCTION

Due to the complexity, diversity and variability of plant
diseases, constructing a large-scale dataset with high-quality
is difficult. First, building agricultural datasets should resort to
many experts in different fields for annotation. For example,
annotating diseases on apple fruit trees and juglans need dif-
ferent experts, which is demand-specific and time-consuming.
Second, collecting plant disease images is extremely limited
by the time and location. For example, the pear black spot
occurs usually in May while the apple ring rot often occurs in
Fig. 3. Sorted distribution of image number per category in three
macro-classes of the PDD271.
the Bohai Sea area. We have to arrive at the place of disease
in time. Otherwise all the work would be in vain.
In particular, the data construction is composed of the
following three steps. (1) Taxonomic System Establishment.
We establish a four-layer hierarchical taxonomic system for
the PDD271 dataset. We invite several agricultural experts and
discuss the common categories of plant diseases which exist in
daily life. Each disease is assigned an upper-level class based
on the plant that suffers from the disease. And each plant is
assigned an upper-level class based on planting condition and
plant morphology. For example, the apple brown spot spoils
the apple, and the apple belongs to the fruit tree. Finally,
we construct a structure with the dataset root, macro-classes,
plant categories and plant diseases with 1, 3, 43 and 271 nodes
in the first-layer, second- layer, third-layer and fourth-layer,
respectively. Fig. 5 shows the results of plant disease hierarchy
visualization. (2) Dataset collection. In order to collect large
numbers of disease images, we organize ten teams. Each team
consists of eight students from the agricultural university and
four experts in relevant fields. Each team collects thirty kinds
of diseases, where every disease contains over five hundred
different images from different plants. Experts are responsible
to guarantee the quality of disease images and their annotation.
Fig. 4. Examples from different datasets. When capturing images, one standard protocol is that the
distance between the camera and plant is in [20cm,30cm] to
guarantee similar visual scope. Every plant disease category
Plant disease recognition belongs to fine-grained recogni- contains 500 images at least, and more than 200 plants are
tion. However, we can not simply and directly use existing captured for one category. In addition, one plant can be
fine-grained classification methods without considering the captured from different angles. This is for the diversity of
characteristics of plant disease images. For example, as shown plant disease data and this diversity is good for gaining higher
in Fig. 1, in the plant disease recognition task, large intra-class generalization power of networks. (3) Dataset processing and
variations are caused by not only different poses, scales and expansion. After image collection, each image is checked by
rotations, but also by different infected stages. In addition, three experts to make sure the label correctness. Then, experts
plant disease symptoms are usually not very salient in the plant remove blurry images and other noisy images to keep the
disease image. Hence, different from existing fine-grained dataset clean. For the categories with fewer images, we further
recognition methods, taking characteristics of the plant disease collect more images to guarantee the image number of each
image into consideration, we design a different fine-grained category.
Fig. 5. Taxonomy of the PDD271 dataset.
The whole data construction takes about 2 years. The result- regions and the loss during the weakly-supervised learning to
ing PDD271 contains 220,592 images and 271 categories. emphasize discriminative diseased parts for the purpose of the
As shown in Fig. 3, the minimum number of images per plant disease recognition. As shown in Fig. 6, this framework
category is over 400 and the maximum one is 2000. The mainly consists of three stages, namely Cluster-based Region
balanced distribution ensures the stability of model train- Reweighting (CRR), Training with Loss Reweighting (TLR)
ing. A reliable dataset plays an essential role in developing and Weighted Feature Integration (WFI). CRR takes all the
image processing technologies in a specific area. For example, divided patches from plant disease images as input and sets
HiEve [38] is vital to human-centric analysis, so as ATRW [39] the weight of each patch according to the cluster distribution
to wildlife conservation. Likewise, the proposed dataset of the visual features of these patches. For each patch-label
PDD271 offers a large coverage and diversity of plant dis- pair, TLR allocates the corresponding weight to each loss
eases. It will further the plant disease recognition agenda and during weakly-supervised training in order to enable the
expand the image processing techniques into the agricultural discriminative disease part learning. Based on extracted patch
area. features from TLR and corresponding weights from CRR,
WFI utilizes the LSTM network to encode the weighed patch
IV. F RAMEWORK feature sequence into a comprehensive feature representation.
In this section, we introduce the proposed framework which Section IV-A details CRR, Section IV-B introduces TLR and
explores a multi-scale strategy and reweights both visual Section IV-C presents WFI.
Fig. 6. The proposed plant disease recognition framework.
A. Cluster-Based Region Reweighting Normally, the patches containing similar visual symptoms
are likely to be assigned to the same clusters. In case of small
Many diseases present small and scattered lesions, such
distance among clusters, the visual phenotypes of different
as pumpkin mildew, pear frog-eye leaf spot and actinidia
diseases are similar and hard to distinguish by the deep model.
brown spot. The deep convolutional neural networks trained
Therefore, these clusters are given higher weights to enhance
with image level labels often overlook these lesions while
their influence in follow-up feature learning and integration.
focusing on more salient parts. Considering these situations,
The size of cluster is also an important indicator. There is
we explore a multi-scale strategy by dividing the images into
a highly skewed distribution of different disease patches. For
non-overlapping patches and enlarging every patch to avoid
example, the number of non-diseased patches containing com-
missing diseased patches. However, the disease-independent
plex backgrounds and foliar healthy parts is very large, but the
patches, such as the complex backgrounds and the healthy
number of patches containing cotton eye spot disease is small
parts, are enhanced even more in the above process, which
due to the concentrated symptom of this disease leading to
could lead to severe unbalance between the diseased patches
the poor classification performance. Meanwhile, the distance
and the irrelevant ones. To address this problem, we attempt
between two clusters indicates their visual difference. If one
to use the visual similarity among the same disease to cluster
cluster is far from the other clusters, we can easily obtain
the patches of the same disease. Afterwards, we reweight
discriminative features for this cluster, thus assign a small
the patches based on the clustering result and indicate the
weight to it. Hence we assign these clusters suitable weights
discriminative level of each patch.
to make their influences as balanced as possible.
Formally, all patches from all the original training images
Given all these, we assign the cluster weights according
form a new training set. Let X ∈ Rm×N denotes the visual
to the following rule: the larger size the cluster and the
features of these patches, where m is the dimension of the
farther away from the others, the smaller its weight. We use
visual feature and N is the number of training patches. We then
a monotone decreasing function F = e x/(x−1) to model this
have these patches clustered into k cluster classes c with
change. According to the size of the cluster and the distance
their centroids being {μ1 , μ2 , . . . , μk } ∈ Rm . To compute the
distribution among the cluster centroids, we compute the
weight wx , x ∈ X, the weights of the clusters wc and the
weights of the cluster ci as follows,
probability distribution px of x belonging to over all clusters
are computed. Then, wx is computed as
wci = F(Nci ) × F( d(μi , μ j )), (2)
wx = px · wc , (1) j =i, j ∈1,...,k
where wc = [wc1 , . . . , wci , . . . , wck ] and wci denotes the where Nci is the number of patches in cluster ci and d(μi , μ j )
weight of the cluster ci . is the distance between the centroid μi and μ j .
To compute the probability distribution px , we use a soft C. Weighted Feature Integration

assignment strategy based on the distances between a patch
The combination of diseased and healthy patches in plant
and the cluster centroids. The assignment probability distrib-
images constitutes the complex and diverse visual patterns.
ution px is computed as following,
We try to model the semantic correlation from the combination
px = F(d(x, μi )), i ∈ 1, . . . , k. (3) of local patches. Specifically, we propose a feature integration
model with reweighting patch features as the inputs to induce
The weight wx is then computed using Equation 1. the BiLSTM network to model the semantic correlativity
For the patches from validation and testing datasets, among patches by end-to-end training.
we compute the probability distributions px based on the Given a feature sequence S = [x̂ 1 , . . . , x̂ t ] extracted from
distance to the centroids learned from the training patches. The the network with TLR and its corresponding weight sequence
cluster weights have been computed and the patch weights are W = [ŵ1 , . . . , ŵt ] obtained from CRR, where t denotes the
computed via Equation 1. The patches and their corresponding number of patches for one image, we combine the feature
weights are used for the model training with loss reweighting. sequence with the weight via a following function A(S, W ),
A(S, W ) = [ f (x̂ 1 , ŵ1 ), . . . , f (x̂ t , ŵt )]. (7)

B. Training With Loss Reweighting
Note that the function A(S, W ) can be one of many aggre-
To extract more discriminative regional features for the
gation methods, such as deep feedforward networks. Without
given patches, we train the network with a reweighted
loss of generality, the element-wise multiplication is adopted
loss function. An observed input patch x shares the same
in our experiment.
label l with the original image. The model computes an
For each image, a common two-layer stacked LSTM is
observation o for this patch. The score can be interpreted
adopted to fuse weighted patch feature sequences into the final
as an estimation of a class posterior probability pθ (o|x),
representation. The hidden state of the first LSTM is fed into
where θ is the model parameters. Given labeled training data
the second LSTM layer which follows the reversed order of the
{(x n , ln ) : n = 0, . . . , N − 1}, the original cross-entropy loss
first one. The dimension of hidden states from both layers is
is defined as:
4,096. The output o = L( A(S, W ); θ ), where θ is the model

N−1 M parameters. We use softmax to generate the class probability
Lossori (x n , ln ; θ ) = − yon ,ln log( pθ (on |x n )) (4) vector for each image Si , denoted as φ(L( A(Si , Wi ); θ )) ∈
n=0 l=1 R M×1 . The final loss function is defined as follows:
where M denotes the number of plant disease classes, and y
Losslst m (Si , Wi ; θ )
is one binary indicator defined as follows: N
−1 M

t
1 on = l n =− yo ,li log(φ(L( A(Si , Wi ); θ )). (8)
yon ,ln = (5) i =0 l=1
i
0 on = l n
By optimizing this loss function, we obtain a weighted BiL-
However, this loss treats every patch equally. As a result,
STM to encode the patch feature sequence into a comprehen-
patches irrelevant to the disease symptoms distract the opti-
sive feature representation for plant disease recognition.
mization of network. To solve this, we propose a new
reweighted loss to enhance the influence of patches with dis-
criminative diseased symptoms and to reduce the interference V. E XPERIMENT
of irrelevant patches. For the observed input patch x, its weight
w
x is precomputed via CRR. Given labeled training data A. Experimental Setting
(x n , wxn , ln ) : n = 0, . . . , N − 1 , we define the reweighted 1) Dataset Split and Evaluation Metric: The PDD271 con-
loss function as follows: tains 220,592 images belonging to 271 classes of diseases.
We follow a roughly 7: 2: 1 split. The PDD271 is split into
Lossrew (x n , wxn , ln ; θ )
154,701 training, 44,002 validation, and 21,889 testing images.

N−1 M
Top-1 classification accuracy is adopted as the evaluation
=− wwn yon ,ln log( pθ (on |x n )). (6)
metric.
n=0 l=1
2) Hyperparameter Setting: All the images are resized to
We allocate the weight to each loss for each patch-label 224 × 224. Each image is divided into 4 × 4 patches. The
pair. This loss forces the model to focus on the patches initial learning rate is 0.001 and is divided by 10 after
with discriminative diseased parts and to ignore the irrelevant every 20 epochs with the standard SGD optimizer. Training
patches as much as possible. This trained model can be used to converges after 100 epochs. The batch size is 128, and
extract visual features from all the patches. The patch features the momentum is 0.9. We adopt a random horizontal flip
from the same image form a sequence as the input for the method for data argumentation in all the experiments. Our
following weighted feature integration. For clarity, x̂ denotes project will be made available at https://github.com/liuxindazz/
patch feature from the same image. PDD271https://github.com/liuxindazz/PDD271.
Fig. 9. The t-SNE visualizations of the result of K-means clustering(randomly

choosing five cluters).
TABLE II
Fig. 7. The result of elbow method. The blue line shows that the SSE changes P ERFORMANCE C OMPARISON FOR D IFFERENT T RAINING M ETHODS
with the K , and the orange line is the MA line.
clustering is converged in this dataset. Furthermore, Fig. 9

shows the t-SNE [40] map of clustering with patch samples
in random five clusters to support the clustering results qual-
itatively.
2) Evaluation of TLR: We used the following comparison
for evaluating TLR.
• ResNet152. This method directly finetunes
ResNet152 using the whole image.
• w/o WSL. This method uses the finetuned ResNet152
to directly extract the feature of each patch following
the maxpooling and softmax layers. ‘w/o WSL’ means
‘without weakly supervised learning’.
• w/o Weights. This method first uses the patches from all
Fig. 8. The percentage distribution results from ten experiments.
the images to finetune ResNet152, and then extract the
feature of each patch following maxpooling and softmax
layers. ‘w/o Weights’ means ‘training on patches without
B. Experiment on PDD271 weights’.
1) The Choice of K in CRR: There are many notable cluster Table II shows the experimental results. We can see that
algorithms, such as Gaussian mixture model and K-means (1) w/o WSL brings a drop of performance. The probable
clustering. Considering the very large size of the dataset and reason is that the information about backgrounds or healthy
the robustness of the algorithm, we chose the K-means++ parts is harmful for predicting disease categories and blocking
as the cluster algorithm on all our experiments. We used into patches exacerbates the influence of the information.
the ‘elbow’ method to determine the value of K . As shown (2) Training on patches without weights improves the recog-
in Fig. 7, from the sum of the squared errors (SSE) and the nition performance compared with ResNet152. It showes that
trend line of SSE computed by the Moving Average (MA) weakly-supervised learning is effective for this task. (3) TLR
method, we can obtain the obvious ‘elbow’ point in where K gives further performance boost in both the validation set
is 240. and the testing set, which demonstrates the advantage of loss
We repeat the clustering procedure 10 times in Fig. 8, reweighting in emphasizing diseased parts. (4) Due to the
and observe that the distributions of these clustering results complexity, diversity, and variability of plant disease, it is
are similar and stable, quantificationally indicating that the hard to improve the recognition performance dramatically.
TABLE III TABLE V

P ERFORMANCE C OMPARISON FOR I NTEGRATION M ETHODS C OMPARISON W ITH THE R ESULTS FOR S TATE - OF - THE -A RT D EEP
N ETWORK A RCHITECTURES AND F INE -G RAINED
R ECOGNITION M ODELS ON PDD271
TABLE IV
A BLATION S TUDY ON THE PDD271
However, the proposed TLR still gives a considerable improve-

ment compared to the results of other competitive baselines.
3) Evaluation of WFI: We evaluated the effect of LSTM TABLE VI
and its variants including BiLSTM and sumBiLSTM for P ERFORMANCE C OMPARISON FOR D IFFERENT PATCH S IZES .4 × 4 M EANS
THE I MAGE I S D IVIDED I NTO 4 × 4 PATCHES , AND THE
integration of patch feature sequence without reweighting. S IZE OF E ACH PATCH I S 56 × 56 P IXELS
In contrast, the input of WFI is the weighted patch feature
sequence. As shown in Table III, the proposed WFI achieves
the best recognition performance, and benefits significantly
from the feature reweighting strategy.
4) Ablation Study: We further evaluated the effect of
each component in our framework: CRR, TLR, and WFI.
We designed different runs in the PDD271 datasets as follows.
• w/o CRR. In this baseline, we directly use the trained experimental results also show that combining our method
network to extract patch features, and then fuse these with SeNet154 improves performance by ∼1 percent in testing
feature via BiLSTM. (from 84.63% to 85.58%) and achieves the state-of-the-art
• w/o TLR. In this baseline, we replace the reweighted performance. This phenomenon further demonstrates that our
loss with the original loss during the weakly-supervised method and attention-based methods can work together in one
training in our framework. framework to futher enhance the recognition performance. (3)
• w/o WFI. In this baseline, WFI is replaced with the Overall, our method performs better than the state-of-the-art
Maxpooling layer in our framework. fine-grained methods, including the attention-based method
As can be seen from the Table IV, (1) Any one of three com- WS-DAN [33] and the patch-based methods NTS-NET [32]
ponents in isolation brings disease recognition performance and DCL [41]. This phenomenon shows that our method is
gain; (2) Without TLR, the performance drops to 89.23%, more appropriate for plant disease recognition with consider-
showing that TLR is crucial in improving the performance. ing the characteristics of plant disease image.
(3) Without CRR, the performance in testing set drops to 6) Influence of Different Patch Size: The size of patches
84.87%, indicating that CRR enhances the robustness of our is a important factor to avoid missing diseases, therefore
framework. The ablation study validates our design is rational we consider evaluating the influence of different patch size.
that it is necessary to jointly adopt three components in order We divide each image into 2 × 2, 3 × 3, 4 × 4 patches,
to achieve the best performance. respectively. Considering the computational complexity and
5) Comparisons to the State-of-the-Art: For further verifica- the receptive field, we do not verify the smaller patch size.
tion for the proposed method, we compare our method against The result is shown in Table VI.
the state-of-the-art deep network architectures and fine-grained 7) Influence of Different Patch Order: To evaluate the
recognition models, as shown in Table V. The results show that influence of different patch order, we compare the different
(1) The performance of the SeNet154 is better than other single orders of the patch feature squence in Table VII. We observe
networks, since it can obtain the useful disease information that the performances for the different fixed orders are almost
with spatial-channel attention. (2) Our method can improve the the same. The certain different order of the patch list does not
performance of the Resnet152 by 1.28% without adding any matter for the final prediction. The most surprising aspect of
attention modules. In addition, we also change the backbone the data is that the random unfixed order is much better than
network from Resnet152 to attention model SeNet154. The others. A possible explanation for this might be that the plant
Fig. 10. Qualitative results. From top to bottom, (a) the original image with annotating diseased parts by red boxes, (b) the feature map from the last
convolution layer of VGG16, (c) the feature map from the last convolution layer of ResNet152, (d) the feature map from the last convolution layer of
SeNet154, (e) visualisation of the proposed CRR weights for each patch. The red means high weights and the blue means relatively low weights. For the best
view, we only visualize the weights which are bigger than 0.75. CRR can consider more regions and obtain more characteristics.
TABLE VII Compared with feature maps from typical deep networks,
I MPACT OF O RDERS . T HE ‘ T ’, ‘ B ’, ‘ L’, AND ‘ R ’ D ENOTE THE T OP, we can find that the proposed reweighted maps can cover more
B OTTOM , L EFT AND R IGHT, R ESPECTIVELY. T HE ‘ T 2 B , L 2 R ’ M EANS
THE PATCHES F ROM E ACH I MAGE A RE O RDERED F ROM T OP
discriminative regions. The VGG16 and ResNet152 probably
TO B OTTOM AND L EFT TO R IGHT. T HE ‘ RD ’ D ENOTES THE focus on disease-irrelevant regions, and meanwhile ignores
R ANDOM O RDER . T HE ‘ FIXED ’ D ENOTES T HAT THE O RDER some useful information. Our approach can pay attention
OF THE PATCH L IST FOR E ACH I MAGE I S F IXED , AND
THE ‘ UNFIXED ’ D ENOTES T HAT THE O RDER OF THE
to multiple scattered regions, which is more appropriate for
PATCH L IST FOR E ACH I MAGE I S U NFIXED plant disease recognition. The visualization results of the
PDD271 further demonstrate the effectiveness of the proposed
cluster-based reweighting strategy.
In addition, we further show the confusion matrix of our
method on the PDD271 in Fig. 11, where the vertical axis
shows the ground-truth classes and the horizontal axis shows
the predicted classes. Yellower colors indicate better perfor-
mance. We can see that our method still does not provide per-
fect performance for some plant disease categories. We enlarge
specific regions to highlight the misclassified results and show
some samples from confused categories. We can see that these
plant disease categories are very similar in visual appearance
is diseased no matter where the lesions appear in. Another and texture. Even the humans do not easily distinguish among
possible explanation is that the uncertain order is likely to these disease categories. The probable solution is to design
enhance the power of networks. more fine-grained visual feature learning methods or use
8) Visualization: We visualize different emphasized parts multi-source information from different sensors to classify
in different methods via gradient-weighted class activation these plant disease categories.
heatmap [49]. Fig. 10 shows the visualization results of some
typical deep architectures, such as VGG16 and ResNet152.
The reweighted maps of the proposed cluster-based region C. Experiment on PlantVillage Dataset
reweighting strategy are shown in Fig. 10 (d), where we only Besides the PDD271, we also conduct the evalua-
visualize the weight of the patch x when wx ≥ 0.75. tion on another publicly available benchmark datasets, the
both patch features to highlight diseased patches and the loss

to guide the model optimization. Qualitative and quantitative
evaluations on the PDD271 and PlantVillage datasets demon-
strate the effectiveness of the proposed method. Nevertheless,
a limitation of our method is that the proposed method is a
little slow due to adding the clustering process before training.
We will try to accelerate our method in the future. Another
interesting work is analyzing the impact of the random unfixed
order of patches. The random unfixed order enhances the
performance, which may seem counterintuitive at first glance.
Additionally, we can further consider a more advanced variant
of LSTM as the alternative to the conventional LSTM, such as
SFMRNN [51] and H-LSTCM [52]. Besides, a further study of
the imbalanced problem between disease and healthy classes
of image patches could assess the long-term effects. Hard
sample mining [53] and hard triplet generating [54] could be
more efficient and accurate.
The study on the visual plant disease recognition is still
at the initial stage. How to discover discriminative diseased
Fig. 11. Confusion matrix of our method on the PDD271. regions more efficiently and accurately remains an open ques-
tion for further investigation.
TABLE VIII
P ERFORMANCE C OMPARISON OF M ETHODS ON P LANT V ILLAGE R EFERENCES
[1] Z. Li et al., “Non-invasive plant disease diagnostics enabled by
smartphone-based fingerprinting of leaf volatiles,” Nature Plants, vol. 5,
no. 8, pp. 856–866, Aug. 2019.
[2] G. Litjens et al., “A survey on deep learning in medical image analysis,”
Med. Image Anal., vol. 42, pp. 60–88, Dec. 2017.
[3] W. Min, S. Jiang, L. Liu, Y. Rui, and R. Jain, “A survey on food
computing,” ACM Comput. Surv., vol. 52, no. 5, pp. 92:1–92:36, 2019.
[4] E. Moen, D. Bannon, T. Kudo, W. Graf, M. Covert, and D. Van Valen,
“Deep learning for cellular image analysis,” Nature Methods, vol. 16,
no. 12, pp. 1233–1246, 2019.
[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
with deep convolutional neural networks,” in Proc. Neural Inf. Process.
Syst., 2012, pp. 1106–1114.
[6] A. Bauer et al., “Combining computer vision and deep learning to enable
PlantVillage dataset, to further verify the effectiveness of our ultra-scale aerial phenotyping and precision agriculture: A case study of
lettuce production,” Horticulture Res., vol. 6, no. 1, pp. 1–12, Dec. 2019.
method. The PlantVillage dataset contains 38 plant disease [7] S. Sladojevic, M. Arsenovic, A. Anderla, D. Culibrk, and D. Stefanovic,
categories and a total of 54,309 images. It is split following “Deep neural networks based recognition of plant diseases by leaf
the setup in [12], 80% of the dataset is used for training and image classification,” Comput. Intell. Neurosci., vol. 2016, pp. 1–11,
May 2016.
20% for validation. All the methods have good performance [8] J. Wang, L. Chen, J. Zhang, Y. Yuan, M. Li, and W. Zeng, “CNN transfer
as shown in Table VIII, because all the images are shot on the learning for automatic image-based classification of crop disease,” in
table. Image and Graphics Technologies and Applications. Beijing, China:
Springer, 2018, pp. 319–329.
[9] K. P. Ferentinos, “Deep learning models for plant disease detection
VI. C ONCLUSION AND F UTURE W ORK and diagnosis,” Comput. Electron. Agricult., vol. 145, pp. 311–318,
Plant disease recognition is an interesting and practical Feb. 2018.
[10] G. Wang, Y. Sun, and J. Wang, “Automatic image-based plant disease
topic. However, this problem has not been sufficiently explored severity estimation using deep learning,” Comput. Intell. Neurosci.,
due to the lack of systematical investigation and large-scale vol. 2017, pp. 1–8, Jul. 2017.
dataset. The most challenging step in constructing such a [11] M. RuBwurm and M. Korner, “Temporal vegetation modelling using
long short-term memory networks for crop identification from medium-
dataset is providing a reasonable structure from both the resolution multi-spectral satellite images,” in Proc. IEEE Conf. Comput.
agriculture and image processing perspective. Vis. Pattern Recognit. Workshops (CVPRW), Jul. 2017, pp. 11–19.
In this paper, we systematically investigate the problem of [12] M. Brahimi, M. Arsenovic, S. Laraba, S. Sladojevic, K. Boukhalfa,
and A. Moussaoui, “Deep learning for plant diseases: Detection and
plant disease recognition in the community of image process- saliency map visualisation,” in Human and Machine Learning. Cham,
ing. With the help of agriculture experts, we construct the Switzerland: Springer, 2018, pp. 93–117.
first large-scale plant disease dataset with 271 plant disease [13] W. Ge, X. Lin, and Y. Yu, “Weakly supervised complementary parts
models for fine-grained image classification from the bottom up,”
categories and 220,592 images. Furthermore, we present a in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
plant disease oriented framework for plant disease recognition Jun. 2019, pp. 3034–3043.
based on their distinctive characteristics. We design a strategy [14] A. J. Wakeham, G. Keane, and R. Kennedy, “Field evaluation of a
competitive lateral-flow assay for detection of alternaria brassicae in veg-
to compute patch weights based on the cluster distribution etable Brassica crops,” Plant Disease, vol. 100, no. 9, pp. 1831–1839,
of patch features and then use learned weights to reweight Sep. 2016.
[15] A. K. Lees, L. Sullivan, J. S. Lynott, and D. W. Cullen, “Development [38] W. Lin et al., “Human in events: A large-scale benchmark for human-
of a quantitative real-time PCR assay for phytophthora infestans and centric video analysis in complex events,” 2020, arXiv:2005.04490.
its applicability to leaf, tuber and soil samples,” Plant Pathol., vol. 61, [Online]. Available: https://arxiv.org/abs/2005.04490
no. 5, pp. 867–876, Oct. 2012. [39] S. Li, J. Li, H. Tang, R. Qian, and W. Lin, “ATRW: A benchmark for
[16] C. H. Bock, G. H. Poole, P. E. Parker, and T. R. Gottwald, “Plant disease Amur tiger re-identification in the wild,” in Proc. 28th ACM Int. Conf.
severity estimated visually, by digital photography and image analysis, Multimedia, Oct. 2020, pp. 2590–2598.
and by hyperspectral imaging,” Crit. Rev. Plant Sci., vol. 29, no. 2, [40] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,”
pp. 59–107, Mar. 2010. J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.
[17] F. Ahmad and A. Airuddin, “Leaf lesion detection method using artificial [41] Y. Chen, Y. Bai, W. Zhang, and T. Mei, “Destruction and construction
bee colony algorithm,” in Advances in Computer Science and its learning for fine-grained image recognition,” in Proc. IEEE/CVF Conf.
Applications, vol. 279. Beijing, China: Springer, 2014, pp. 989–995. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 5157–5166.
[18] S. Prasad, P. Kumar, and A. Jain, “Detection of disease using block- [42] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
based unsupervised natural plant leaf color image segmentation,” in large-scale image recognition,” in Proc. Int. Conf. Learn. Represent.,
Swarm, Evolutionary, and Memetic Computing. Beijing, China: Springer, 2015, pp. 1–14.
2011, pp. 399–406. [43] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
[19] A. Ramcharan, K. Baranowski, P. Mcclowsky, B. Ahmed, and image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
D. P. Hughes, “Using transfer learning for image-based cassava disease (CVPR), Jun. 2016, pp. 770–778.
detection,” Frontiers Plant Sci., vol. 8, p. 1852, Oct. 2017. [44] S. Zagoruyko and N. Komodakis, “Wide residual networks,” in Proc.
Brit. Mach. Vis. Conf., 2016, p. 87.
[20] E. Mwebaze, T. Gebru, A. Frome, S. Nsumba, and J. Tusubira,
[45] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
“iCassava 2019 fine-grained visual categorization challenge,” 2019,
connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis.
arXiv:1908.02900. [Online]. Available: https://arxiv.org/abs/1908.02900
Pattern Recognit. (CVPR), Jul. 2017, pp. 2261–2269.
[21] D. P. Hughes and M. Salathé, “An open access repository of images
[46] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
on plant health to enable the development of mobile disease diagnostics
the inception architecture for computer vision,” in Proc. IEEE Conf.
through machine learning and crowdsourcing,” 2015, arXiv:1511.08060.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826.
[Online]. Available: https://arxiv.org/abs/1511.08060
[47] X. Zhang, Z. Li, C. C. Loy, and D. Lin, “PolyNet: A pursuit of structural
[22] H. Yu and C. Son, “Apple leaf disease identification through diversity in very deep networks,” in Proc. IEEE Conf. Comput. Vis.
region-of-interest-aware deep convolutional neural network,” 2019, Pattern Recognit. (CVPR), Jul. 2017, pp. 3900–3908.
arXiv:1903.10356. [Online]. Available: https://arxiv.org/abs/1903.10356 [48] C. Liu et al., “Progressive neural architecture search,” in Proc. Eur. Conf.
[23] C. Xie et al., “Multi-level learning features for automatic classification Comput. Vis., Sep. 2018, pp. 19–35.
of field crop pests,” Comput. Electron. Agricult., vol. 152, pp. 233–241, [49] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
Sep. 2018. D. Batra, “Grad-CAM: Visual explanations from deep networks via
[24] X. Wu, C. Zhan, Y.-K. Lai, M.-M. Cheng, and J. Yang, “IP102: A gradient-based localization,” in Proc. IEEE Int. Conf. Comput. Vis.
large-scale benchmark dataset for insect pest recognition,” in Proc. (ICCV), Oct. 2017, pp. 618–626.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, [50] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally,
pp. 8787–8796. and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer
[25] J. Donahue et al., “Decaf: A deep convolutional activation feature for parameters and 1mb model size,” 2016, arXiv:1602.07360. [Online].
generic visual recognition,” in Proc. Int. Conf. Mach. Learn., vol. 2014, Available: https://arxiv.org/abs/1602.07360
pp. 647–655. [51] H. Hu and G. Qi, “State-frequency memory recurrent neural networks,”
[26] N. Zhang, J. Donahue, R. Girshick, and T. Darrell, “Part-based R-CNNs in Proc. Int. Conf. Mach. Learn., vol. 70, 2017, pp. 1568–1577.
for fine-grained category detection,” in Proc. Eur. Conf. Comput. Vis., [52] X. Shu, J. Tang, G. Qi, W. Liu, and J. Yang, “Hierarchical long
2014, pp. 834–849. short-term concurrent memory for human interaction recognition,” IEEE
[27] T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, and Z. Zhang, “The Trans. Pattern Anal. Mach. Intell., early access, Sep. 17, 2018, doi: 10.
application of two-level attention models in deep convolutional neural 1109/TPAMI.2019.2942030.
network for fine-grained image classification,” in Proc. IEEE Conf. [53] K. Chen, Y. Chen, C. Han, N. Sang, and C. Gao, “Hard sample
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 842–850. mining makes person re-identification more efficient and accurate,”
[28] M. Lam, B. Mahasseni, and S. Todorovic, “Fine-grained recognition as Neurocomputing, vol. 382, pp. 259–267, Mar. 2020.
HSnet search for informative image parts,” in Proc. IEEE Conf. Comput. [54] Y. Zhao, Z. Jin, G. Qi, H. Lu, and X. Hua, “An adversarial approach to
Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6497–6506. hard triplet generation,” in Proc. Eur. Conf. Comput. Vis., vol. 11213,
[29] T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear convolutional neural Sep. 2018, pp. 508–524.
networks for fine-grained visual recognition,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 40, no. 6, pp. 1309–1322, Jun. 2018.
[30] S. Jiang, W. Min, L. Liu, and Z. Luo, “Multi-scale multi-view deep
feature aggregation for food recognition,” IEEE Trans. Image Process.,
vol. 29, pp. 265–276, 2020.
[31] W. Min, L. Liu, Z. Luo, and S. Jiang, “Ingredient-guided cascaded multi-
attention network for food recognition,” in Proc. 27th ACM Int. Conf.
Multimedia, Oct. 2019, pp. 99–107.
[32] Z. Yang, T. Luo, D. Wang, Z. Hu, J. Gao, and L. Wang, “Learning
to navigate for fine-grained classification,” in Proc. Eur. Conf. Comput.
Vis., Sep. 2018, pp. 438–454.
[33] T. Hu and H. Qi, “See better before looking closer: Weakly supervised
data augmentation network for fine-grained visual classification,” CoRR,
vol. abs/1901.09891, 2019.
[34] Y. Peng, X. He, and J. Zhao, “Object-part attention model for fine-
grained image classification,” IEEE Trans. Image Process., vol. 27, no. 3, Xinda Liu received the B.E. degree from the
pp. 1487–1500, Mar. 2018. China University of Mining and Technology, Bei-
[35] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in jing, China, in 2013, and the M.E. degree from
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, Ningxia University, Yinchuan, China, in 2016. He is
pp. 7132–7141. currently pursuing the Ph.D. degree with the State
[36] G.-J. Qi, “Hierarchically gated deep networks for semantic segmen- Key Laboratory of Virtual Reality Technology and
tation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Systems, School of Computer Science and Engineer-
Jun. 2016, pp. 2267–2275. ing, Beihang University, Beijing. He is also with
[37] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, and H.-J. Zhang, “Image classifica- the Peng Cheng Laboratory. His research interests
tion with kernelized spatial-context,” IEEE Trans. Multimedia, vol. 12, include machine learning and image processing.
no. 4, pp. 278–287, Jun. 2010.
Weiqing Min (Member, IEEE) received the B.E. Lili Wang (Member, IEEE) received the Ph.D.
degree from Shandong Normal University, Jinan, degree from Beihang University, Beijing, China. She
China, in 2008, the M.E. degree from Wuhan Uni- is currently a Professor with the School of Computer
versity, Wuhan, China, in 2010, and the Ph.D. Science and Engineering, Beihang University, where
degree from the National Laboratory of Pattern she is also a Researcher with the State Key Labo-
Recognition, Institute of Automation, Chinese Acad- ratory of Virtual Reality Technology and Systems.
emy of Sciences, in 2015. He is currently an She is also with the Beijing Advanced Innovation
Associate Professor with the Key Laboratory of Center for Biomedical Engineering. Her research
Intelligent Information Processing, Institute of Com- interests include virtual reality, augmented reality,
puting Technology, Chinese Academy of Sciences. mixed reality, real-time, and realistic rendering.
His current research interests include multimedia
content analysis, understanding and applications, food computing, and
geo-multimedia computing. He has authored or coauthored more than 40 peer-
referenced papers in relevant journals and conferences, including ACM
Computing Surveys, IEEE T RANSACTIONS ON I MAGE P ROCESSING, IEEE Shuqiang Jiang (Senior Member, IEEE) is cur-
T RANSACTIONS ON M ULTIMEDIA, ACM TOMM, IEEE Multimedia Maga- rently a Professor with the Institute of Computing
zine, ACM Multimedia, AAAI, and IJCAI. He organized several special issues Technology, Chinese Academy of Sciences (CAS),
on international journals, such as IEEE Multimedia Magazine, Multimedia Beijing, and a Professor with University of Chinese
Tools and Applications, as a Guest Editor. He served as a TPC member of Academy of Sciences. He is also with the Key Lab-
many academic conferences, including ACM MM, AAAI, and IJCAI. He was oratory of Intelligent Information Processing, CAS.
a recipient of the 2016 ACM TOMM Nicolas D. Georganas Best Paper Award His research interests include multimedia processing
and the 2017 IEEE Multimedia Magazine Best Paper Award. and semantic understanding, pattern recognition, and
computer vision. He has authored or coauthored
more than 150 articles in the related research topics.
He was supported by the New-Star Program of
Science and Technology of Beijing Metropolis in 2008, the NSFC Excellent
Shuhuan Mei received the M.E. degree from Young Scientists Fund in 2013, and the Young top-notch talent of Ten
the Shandong University of Science and Technol- Thousand Talent Program in 2014. He has also served as a TPC member for
ogy, China. His current research interests include more than 20 well-known conferences, including ACM Multimedia, CVPR,
multimedia content analysis, understanding, and ICCV, IJCAI, AAAI, ICME, ICIP, and PCM. He won the Lu Jiaxi Young
applications. Talent Award from Chinese Academy of Sciences in 2012, and the CCF
Award of Science and Technology in 2012. He is the Senior Member of
CCF, a member of ACM, an Associate Editor of ACM TOMM, IEEE
M ULTI M EDIA, and Multimedia Tools and Applications. He is the Vice Chair
of IEEE CASS Beijing Chapter and ACM SIGMM China chapter. He is the
General Chair of ICIMCS 2015, the Program Chair of ACM Multimedia
Asia2019 and PCM2017.

Plant Disease Recognition A Large-Scale Benchmark Dataset and A Visual Region and Loss Reweighting Approach

Uploaded by

Copyright:

Available Formats

Plant Disease Recognition A Large-Scale Benchmark Dataset and A Visual Region and Loss Reweighting Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Plant Disease Recognition A Large-Scale Benchmark Dataset and A Visual Region and Loss Reweighting Approach

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL.

30, 2021 2003

Plant Disease Recognition: A Large-Scale

In summary, we make the following main contributions:

recognition method via reweighting both visual regions and

III. DATASET C ONSTRUCTION

Fig. 5. Taxonomy of the PDD271 dataset.

Fig. 6. The proposed plant disease recognition framework.

To compute the probability distribution px , we use a soft C. Weighted Feature Integration

A(S, W ) = [ f (x̂ 1 , ŵ1 ), . . . , f (x̂ t , ŵt )]. (7)

Fig. 9. The t-SNE visualizations of the result of K-means clustering(randomly

clustering is converged in this dataset. Furthermore, Fig. 9

TABLE III TABLE V

However, the proposed TLR still gives a considerable improve-

both patch features to highlight diseased patches and the loss

You might also like