Singh 2020
Singh 2020
Singh 2020
ABSTRACT food security and have broad economic, social, and environmental
India loses 35% of the annual crop yield due to plant diseases. Early impacts [5].
detection of plant diseases remains difficult due to the lack of lab Timely disease detection in plants remains a challenging task
infrastructure and expertise. In this paper, we explore the possibility for farmers. They do not have many options other than consulting
of computer vision approaches for scalable and early plant disease fellow farmers or the Kisan helpline [17]. Expertise in plant diseases
detection. The lack of availability of sufficiently large-scale non-lab is necessary for an individual to be able to identify the diseased
data set remains a major challenge for enabling vision based plant leaves. Furthermore, in most cases it is necessary to have a lab
disease detection. Against this background, we present PlantDoc: infrastructure to identify a diseased leaf.
a dataset for visual plant disease detection. Our dataset contains In this work, we explore the possibility of using computer vision
2,598 data points in total across 13 plant species and up to 17 classes for scalable and cost-effective plant disease detection. Computer
of diseases, involving approximately 300 human hours of effort in vision has made tremendous advances in the past few years through
annotating internet scraped images. To show the efficacy of our various advances in deep convolutional neural networks. While
dataset, we learn 3 models for the task of plant disease classification. training large neural networks can be very time consuming, the
Our results show that modelling using our dataset can increase the trained models can classify images very quickly, which makes them
classification accuracy by up to 31%. We believe that our dataset also suitable for consumer applications on smartphones. Image
can help reduce the entry barrier of computer vision techniques in processing for detecting plant diseases opens up new avenues to
plant disease detection. combine the knowledge of deep learning approaches with real-
world problems in agriculture, and hence, facilitates advancements
KEYWORDS in agricultural knowledge, the yield of crops, and disease control.
Majority of existing vision-based solutions require high-
Deep Learning, Object Detection, Image Classification
resolution images with a plain background. In contrast, as the
ACM Reference Format: majority of Indian farmers use low-end mobile devices with natural
Davinder Singh*, Naman Jain*, Pranjali Jain*, Pratik Kayal* and Sudhakar background and lighting conditions, we focus on images in natural
Kumawat, Nipun Batra. 2020. PlantDoc: A Dataset for Visual Plant Disease environmental conditions with non-trivial background noise and
Detection. In 7th ACM IKDD CoDS and 25th COMAD (CoDS COMAD 2020),
provide the best possible query resolution for crops and plants.
January 5–7, 2020, Hyderabad, India. ACM, New York, NY, USA, 5 pages.
https://doi.org/10.1145/3371158.3371196
Against this background, we highlight our two main contributions:
i) development of PlantDoc: a dataset of 2,598 images across 13
plant species and 27 classes(17-10, disease-healthy) ii) benchmark-
1 INTRODUCTION
ing the curated data set and showing its utility in disease detection
Annually the Earth’s population increases by about 1.6%, and so in non-controlled environments. To the best of our knowledge,
does the demand for plant products of every kind [16]. The pro- this is the first such dataset containing data from non-controlled
tection of crops against plant diseases has a vital role to play in settings.
meeting the growing demand for food quality and quantity [22]. We evaluated our dataset using various classification and object
In terms of economic value, plant diseases alone cost the global detection architectures mentioned in Section 4 to establish the
economy around US$220 billion annually [1]. According to the requirement of a dataset in non-controlled settings. The results
Indian Council of Agricultural Research, more than 35% of crop suggested that lab-controlled dataset cannot be used to classify or
production is lost every year due to Pests and Disease [15]. Food detect images in real-scenario. We found that fine-tuning the models
security is threatened by an alarming increase in the number of on PlantDoc reduces the classification error by up to 31%. Thus,
outbreaks of pests and plant diseases. These diseases jeopardize our dataset can potentially be used to build an application which
*Equal Contribution.
detects and classifies 27 plant disease/healthy classes efficiently.
Permission to make digital or hard copies of all or part of this work for personal or 2 RELATED WORK
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation Our related work can be broadly categorized into: i) techniques for
on the first page. Copyrights for components of this work owned by others than ACM plant disease detection; and ii) datasets advancing research in plant
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, disease detection.
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected].
CoDS COMAD 2020, January 5–7, 2020, Hyderabad, India 2.1 Techniques for plant disease detection
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7738-6/20/01. . . $15.00 Prior work by Sankaran et al. [19] proposed using reliable sensors
https://doi.org/10.1145/3371158.3371196 for monitoring health and diseases in plants under field conditions.
249
CoDS COMAD 2020, January 5–7, 2020, Hyderabad, India D. Singh*, N. Jain*, P. Jain*, P. Kayal*, S. Kumawat, N. Batra
PVD
PlantDoc
Figure 1: Samples from various classes in the PlantDoc Dataset show the gap between lab-controlled and real-life images
However, plant disease detection using sensors has the potential to 200
benefit only a few farmers because of the substantial hardware cost
175
and lack of expertise to operate such sensors. In contrast, prior work
150
Frequency of Images
by Patil et al. [18] extracted shape features for disease detection in
sugarcane leaves obtaining a final average accuracy of 98.60%. In a 125
similar work, Patil et al. [3] used texture features, namely inertia,
homogeneity, and correlation obtained by calculating the gray level 100
co-occurrence matrix on the image and color extraction for disease 75
detection on maize leaves. Recent work [8] has looked into neural
50
networks for the identification of three different legume species
based on the morphological patterns of leaves veins. Likewise, fea- 25
ture extraction and Neural Network Ensemble (NNE) have been 0
used for recognizing tea leaf diseases with a final testing accuracy
RaostapbtoerLrate Bliligghhtt
App althy
Bell PeppApplele SRcuasbt
Healthy
Late lthy
Grape Hcek Rot
Peach Healthy
Tomato TMomatoBMligohldt
Early Balthy
maatorlyHeBalight
S toriaosaic Virus
us
BluebeerrryLeaf Spoyt
owdery HMealthhyy
Bacteriael althy
Spot
awberry H ildew
YellLoewafVSirpot
of 91% [25]. A host of other recent works have looked at convolu- Bell Pepp er Health
ealt
y Leaf S
Apple He
tional neural network variants for disease detection using plant leaf Cherry H
SquashSoPyabeany H
images [7, 21]. These works are limited to a particular crop, which
E
is a significant limitation. Also, the datasets used in the works have
TomTaoto
Potato
Tomato
Tomeapto
not been made public, thereby, impacting reproducibility.
Tomato
tr
P
Tomato
2.2 Datasets for plant disease detection
The PlantVillage dataset(PVD) [14] is the only public dataset for
plant disease detection to the best of our knowledge. The data set Leaf Classes
curators created an automated system using GoogleNet [23] and
AlexNet [12] for disease detection, achieving an accuracy of 99.35%. Figure 2: Statistics of PlantDoc Dataset
However, the images in PlantVillage dataset are taken in laboratory
setups and not in the real conditions of cultivation fields, due to
which their efficacy in real world is likely to be poor. In contrast,
accurate plant disease detection in the farm setting. We downloaded
we curate real-life images of healthy and diseased plants to create
images from the internet since collecting large-scale plant disease
a publicly available dataset.
data through fieldwork requires enormous effort. We collected
about 20,900 images by using scientific and common names of 38
3 THE PLANTDOC DATASET classes mentioned in the dataset by Mohanty et al. [14].
The PlantVillage dataset contains images taken under controlled Four users filtered the images by selecting images based on their
settings. This dataset limits the effectiveness of detecting diseases metadata on the website and guidelines mentioned on APSNet [2].
because, in reality, plant images may contain multiple leaves with APS compiled a list of peer-reviewed literature corresponding to
different types of background conditions with varying lighting each plant disease. We referred APS’ prior literature and accord-
conditions (shown in Figure 1). Against this background, we now ingly classified images. Some of the most important factors for
describe our curated dataset and discuss the techniques used for classification were the color, area and density of the diseased part
curation. and shape of the species. We removed inappropriate (such as non-
leaf plant, lab controlled and out-of-scope images) and duplicate
3.1 Data Collection images across classes downloaded due to web search. Every image
To account for the intricacies of the real world, we require models was checked by two individuals according to the guidelines to re-
trained on real-life images. This fact motivated us to create a dataset duce labeling errors. Finally, to have sufficient training samples, we
by downloading images from Google Images and Ecosia [6] for removed the classes with less than 50 images. Figure 2 shows the
250
PlantDoc: A Dataset for Visual Plant Disease Detection CoDS COMAD 2020, January 5–7, 2020, Hyderabad, India
251
CoDS COMAD 2020, January 5–7, 2020, Hyderabad, India D. Singh*, N. Jain*, P. Jain*, P. Kayal*, S. Kumawat, N. Batra
(a) (b)
252
PlantDoc: A Dataset for Visual Plant Disease Detection CoDS COMAD 2020, January 5–7, 2020, Hyderabad, India
REFERENCES [13] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
[1] GN Agrios. 2005. Plant pathology 5th Edition: Elsevier Academic Press. Burling- Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector.
ton, Ma. USA (2005), 79–103. In European conference on computer vision. Springer, 21–37.
[2] APSNet. 2019. Resources for Plant Diseases. https://www.apsnet.org/edcenter/ [14] Sharada P Mohanty, David P Hughes, and Marcel Salathé. 2016. Using deep
resources/commonnames/Pages/default.aspx learning for image-based plant disease detection. Frontiers in plant science 7
[3] Sanjay B Patil, K Shrikant, and Bodhe . 2011. Betel Leaf Area Measurement Using (2016), 1419.
Image Processing. International Journal on Computer Science and Engineering [15] T Mohapatra. 2018. ICAR News July-September 2018. Pub-
(IJCSE) 3 (01 2011). lished in monthly newsletter, https://www.icar.org.in/sites/default/files/
[4] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: ICARNewsJulySeptember2018.pdf.
A large-scale hierarchical image database. In 2009 IEEE conference on computer [16] E-C Oerke, H-W Dehne, Fritz Schönbeck, and Adolf Weber. 2012. Crop production
vision and pattern recognition. Ieee, 248–255. and crop protection: estimated losses in major food and cash crops. Elsevier.
[5] Messe Düsseldorf. [n. d.]. SAVE FOOD. ([n. d.]). https://www.messe-duesseldorf. [17] Government of India. 2019. Kisan Knowledge Management System. https:
com/cgi-bin/md_home/lib/pub/tt.cgi/SAVE_FOOD.html?oid=121&lang=2& //dackkms.gov.in/account/login.aspx
ticket=g_u_e_s_t [18] Sanjay B Patil and Shrikant K Bodhe. 2011. Leaf disease severity measurement
[6] Ecosia. 2019. Search Engine. https://www.ecosia.org/?c=en using image processing. International Journal of Engineering and Technology 3, 5
[7] Alvaro Fuentes, Sook Yoon, Sang Kim, and Dong Park. 2017. A robust deep- (2011), 297–301.
learning-based detector for real-time tomato plant diseases and pests recognition. [19] Sindhuja Sankaran, Ashish Mishra, Reza Ehsani, and Cristina Davis. 2010. A
Sensors 17, 9 (2017), 2022. review of advanced techniques for detecting plant diseases. Computers and
[8] Guillermo L Grinblat, Lucas C Uzal, Mónica G Larese, and Pablo M Granitto. Electronics in Agriculture 72, 1 (2010), 1–13.
2016. Deep learning for plant identification using vein morphological patterns. [20] Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks
Computers and Electronics in Agriculture 127 (2016), 418–424. for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[9] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun [21] Srdjan Sladojevic, Marko Arsenovic, Andras Anderla, Dubravko Culibrk, and
Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Darko Stefanovic. 2016. Deep neural networks based recognition of plant diseases
Efficient convolutional neural networks for mobile vision applications. arXiv by leaf image classification. Computational intelligence and neuroscience 2016
preprint arXiv:1704.04861 (2017). (2016).
[10] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun [22] Richard N Strange and Peter R Scott. 2005. Plant disease: a threat to global food
Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: security. Annu. Rev. Phytopathol. 43 (2005), 83–116.
Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR [23] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
abs/1704.04861 (2017). arXiv:1704.04861 http://arxiv.org/abs/1704.04861 Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015.
[11] Alex Krizhevsky et al. 2009. Learning multiple layers of features from tiny images. Going deeper with convolutions. In Proceedings of the IEEE conference on computer
Technical Report. Citeseer. vision and pattern recognition. 1–9.
[12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica- [24] Tzutalin. 2015. LabelImg. Free Software: MIT License. https://github.com/
tion with deep convolutional neural networks. In Advances in neural information tzutalin/labelImg
processing systems. 1097–1105. [25] Zhi-Hua Zhou and SF Chen. 2002. Neural network ensemble. CHINESE JOURNAL
OF COMPUTERS-CHINESE EDITION- 25, 1 (2002), 1–8.
253