Singh 2020
Singh 2020
Singh 2020
ABSTRACT food security and have broad economic, social, and environmental
India loses 35% of the annual crop yield due to plant diseases. Early impacts [5].
detection of plant diseases remains difficult due to the lack of lab Timely disease detection in plants remains a challenging task
infrastructure and expertise. In this paper, we explore the possibility for farmers. They do not have many options other than consulting
of computer vision approaches for scalable and early plant disease fellow farmers or the Kisan helpline [17]. Expertise in plant diseases
detection. The lack of availability of sufficiently large-scale non-lab is necessary for an individual to be able to identify the diseased
data set remains a major challenge for enabling vision based plant leaves. Furthermore, in most cases it is necessary to have a lab
disease detection. Against this background, we present PlantDoc: infrastructure to identify a diseased leaf.
a dataset for visual plant disease detection. Our dataset contains In this work, we explore the possibility of using computer vision
2,598 data points in total across 13 plant species and up to 17 classes for scalable and cost-effective plant disease detection. Computer
of diseases, involving approximately 300 human hours of effort in vision has made tremendous advances in the past few years through
annotating internet scraped images. To show the efficacy of our various advances in deep convolutional neural networks. While
dataset, we learn 3 models for the task of plant disease classification. training large neural networks can be very time consuming, the
Our results show that modelling using our dataset can increase the trained models can classify images very quickly, which makes them
classification accuracy by up to 31%. We believe that our dataset also suitable for consumer applications on smartphones. Image
can help reduce the entry barrier of computer vision techniques in processing for detecting plant diseases opens up new avenues to
plant disease detection. combine the knowledge of deep learning approaches with real-
world problems in agriculture, and hence, facilitates advancements
KEYWORDS in agricultural knowledge, the yield of crops, and disease control.
Majority of existing vision-based solutions require high-
Deep Learning, Object Detection, Image Classification
resolution images with a plain background. In contrast, as the
Against this background, we highlight our two main contributions:
i) development of PlantDoc: a dataset of 2,598 images across 13
plant species and 27 classes(17-10, disease-healthy) ii) benchmark-
ing the curated data set and showing its utility in disease detection
Annually the Earth’s population increases by about 1.6%, and so in non-controlled environments. To the best of our knowledge,
does the demand for plant products of every kind [16]. The pro- this is the first such dataset containing data from non-controlled
tection of crops against plant diseases has a vital role to play in settings.
meeting the growing demand for food quality and quantity [22]. We evaluated our dataset using various classification and object
In terms of economic value, plant diseases alone cost the global detection architectures mentioned in Section 4 to establish the
economy around US$220 billion annually [1]. According to the requirement of a dataset in non-controlled settings. The results
Indian Council of Agricultural Research, more than 35% of crop suggested that lab-controlled dataset cannot be used to classify or
production is lost every year due to Pests and Disease [15]. Food detect images in real-scenario. We found that fine-tuning the models
security is threatened by an alarming increase in the number of on PlantDoc reduces the classification error by up to 31%. Thus,
outbreaks of pests and plant diseases. These diseases jeopardize our dataset can potentially be used to build an application which
detects and classifies 27 plant disease/healthy classes efficiently.
Figure 1: Samples from various classes in the PlantDoc Dataset show the gap between lab-controlled and real-life images
However, plant disease detection using sensors has the potential to 200
benefit only a few farmers because of the substantial hardware cost
and lack of expertise to operate such sensors. In contrast, prior work
Frequency of Images
by Patil et al. [18] extracted shape features for disease detection in
sugarcane leaves obtaining a final average accuracy of 98.60%. In a 125
similar work, Patil et al. [3] used texture features, namely inertia,
homogeneity, and correlation obtained by calculating the gray level 100
co-occurrence matrix on the image and color extraction for disease 75
detection on maize leaves. Recent work [8] has looked into neural
networks for the identification of three different legume species
based on the morphological patterns of leaves veins. Likewise, fea- 25
ture extraction and Neural Network Ensemble (NNE) have been 0
used for recognizing tea leaf diseases with a final testing accuracy
images [7, 21]. These works are limited to a particular crop, which
is a significant limitation. Also, the datasets used in the works have
not been made public, thereby, impacting reproducibility.
2.2 Datasets for plant disease detection
The PlantVillage dataset(PVD) [14] is the only public dataset for
plant disease detection to the best of our knowledge. The data set Leaf Classes
curators created an automated system using GoogleNet [23] and
AlexNet [12] for disease detection, achieving an accuracy of 99.35%. Figure 2: Statistics of PlantDoc Dataset
However, the images in PlantVillage dataset are taken in laboratory
setups and not in the real conditions of cultivation fields, due to
which their efficacy in real world is likely to be poor. In contrast,
accurate plant disease detection in the farm setting. We downloaded
we curate real-life images of healthy and diseased plants to create
images from the internet since collecting large-scale plant disease
a publicly available dataset.
data through fieldwork requires enormous effort. We collected
about 20,900 images by using scientific and common names of 38
3 THE PLANTDOC DATASET classes mentioned in the dataset by Mohanty et al. [14].
The PlantVillage dataset contains images taken under controlled Four users filtered the images by selecting images based on their
settings. This dataset limits the effectiveness of detecting diseases metadata on the website and guidelines mentioned on APSNet [2].
because, in reality, plant images may contain multiple leaves with APS compiled a list of peer-reviewed literature corresponding to
different types of background conditions with varying lighting each plant disease. We referred APS’ prior literature and accord-
conditions (shown in Figure 1). Against this background, we now ingly classified images. Some of the most important factors for
describe our curated dataset and discuss the techniques used for classification were the color, area and density of the diseased part
curation. and shape of the species. We removed inappropriate (such as non-
leaf plant, lab controlled and out-of-scope images) and duplicate
3.1 Data Collection images across classes downloaded due to web search. Every image
To account for the intricacies of the real world, we require models was checked by two individuals according to the guidelines to re-
trained on real-life images. This fact motivated us to create a dataset duce labeling errors. Finally, to have sufficient training samples, we
by downloading images from Google Images and Ecosia [6] for removed the classes with less than 50 images. Figure 2 shows the
(a) (b)
