Crop Disease Detection in Uncontrolled Lighting Conditions
Crop Disease Detection in Uncontrolled Lighting Conditions
Chapter 1: Introduction
1.1 Overview and Background
Food shortage is expanding at an exponential rate due to inadequate agricultural output. To
address this, there is need for farmers to increase their productivity and one of these ways is to
reduce the yield lost due to crop diseases. The agriculture business has greatly improved thanks to
technological advancements. Using tЇme, cost, and accuracy of crop qualЇty evaluatЇon by eye
inspection is a tough task [cite]. To overcome this problem, researchers created several
approaches for developing new technologies such as object recognition and image processing for
quality assessment . Image processing technology is used in this study to identify and classify plant
illnesses [1-3]. This image processing method requires the use of high-resolution images to detect
and classify diseases that were previously hard to capture [cite]. As a result, accurately and
efficiently anticipating illness is a challenging task. By analyzing [4] and taking the required steps
to protect against such leaf illnesses, our research aims to build a model with an improved
accuracy that correctly diagnoses and distinguishes leaf illness at the onset of disease. To improve
the accuracy of the model this study examines a variety of parameters that are mainly used
together with RF models.
1.2 Motivation
According to [1] researchers use images captured in a controlled environment, on a single
background. Previous studies have collected images using indoor boxes avoiding the influence of
external lighting [2][2]This helps the accuracy of the models they develop however it greatly
reduced the practicability of the model in a natural environment. This leaves the models wanting
in their practicability. In their paper [3] noted that the greatest repository of plant leaf dataset
Kaggle has images which are taken on a single background resulting in a dataset lacking the
complexity of the real natural environment.
2
1.3 Problem Statement
The problem to be addressed through this study is the is the improvement of the practicability of
the crop disease detection models being developed. The study will attempt to improve the
practicability of the models by introducing complexity to the dataset that already exist helping to
mimic the natural environment, an improvement in the practicability of the models in crop
disease detection will greatly improve how these algorithms gets used.
Comment:
The problem statement is not coming out clearly. I know what you want to say, but you will not
get a chance to explain to the marker. I propose that you change your motivation and make the
following statement your problem statement.
The use of Machine learning in the identification of plant diseases for some time. In order to
identify plant disease researchers use images captured in a controlled environment, on a single
background [cite]. Previous studies have collected images using indoor boxes avoiding the
influence of external lighting [2]. Such models have high accuracy in traing and testing but have
proved to be poor in actual production when used in the natural environment since the lighting in
such scenarios is not controlled. This study seeks to improve the practicability of the models by
introducing lighting complexity to pre-existing plant image datasets which will help to mimic the
natural environment and resultantly improve on the usability of the models.
1.4.1 Aims
This paper aims to develop a model with improved practicability in classifying a healthy from a
non-healthy leaf.
1.4.2 Objectives
3
1.5 Impact of the Research
The increase in the practicability of crop disease detection model will result in a model that will
work more effectively in a natural environment since the majority of crops are not grown in
indoors boxes.
4
2.2.2 Dataset
In a review of imaging techniques for plant disease detection by [6] they noted the limited
availability of public images to train a CNN model resulting in researchers using transfer learning
instead. The lack of variety leads to the same dataset being used which lacks complexity and the
reduction of how practical these trained models are in a natural environment.
This method was used by [7] in his research to detect powdery mildew in wheat plants.
Correlation, regression and independent sample t tests were used to analyze the 32 spectral
features extracted. As the whole analysis was based on spectral data, it was not effective in early
detection of diseases that affected visual characteristics of plant before altering its internal
structure.
New technologies named multispectral and hyper spectral imaging aimed at reducing the
shortcomings of the previous method by combining spectroscopy and imaging evolved in due
course. In research on strawberry leaves [8] hyperspectral data was used and algorithms like k-
Nearest neЇghbor(kNN) and FЇsher dЇscrЇminant analysis(FDA) were utilized to detect if the plants
were infected by anthracnose crown. Even though they were able to achieve an accuracy of 70
5
percent, the research had several drawbacks which included the tedious process of data collection
using spectroradiometer that necessitated the need for experts that operate it. Further, the data
collected can also be flawed due to shadows cast by sunlight leading to reduced performance as
stated by the authors.
In another research on wheat by Zheng et al. (2019), hyperspectral data was used for analysis as it
was found that spectral data and indices like PRЇ (PhotochemЇcal Reflectance Їndex) and ARЇ
(Anthocyanin Reflectance index) changed with plant development. The results obtaЇned from
theЇr study were better in comparison to previous researches in the field.
Various parameters were tuned and tested in the quest for improving accuracy by other
researchers but the inherent problem of data collection and expert knowledge to understand the
results remained unaddressed , stressing the need for alternative technologies.
Some of the algorithms in machine learning that were popular for classification were KNN and
SVM and have been used widely for disease classification in plants. As previous researches in the
field had placed too much stress on data collection, researchers Dhaware and Wanjale (2017)
have tried to mitigate this by making use of images obtained from handheld devices for analysis
and incorporating image processing in their methodology which involved segmentation. SVM was
used for classification. However, results obtained indicated scope for improvement with usage of
techniques like feature extraction.
An accuracy of 90 percent was obtained by Francis et al. (2016) when composed segmentation
was used to identify diseases like quick wilt and berry spot in pepper plants. It involved green pixel
masking and similarity-based segmentation which was followed by feature extraction.
Propagation neural network was used for classification. As the analysis was carried out on small
dataset, the results are not totally reliable.
Researchers, Padol and Yadav (2016) made use of k- means algorithm for segmentation. Noises
were removed from images using Gaussian filter and around 54 features were extracted from it
6
for analysis. This was however time consuming and tedious, and yielded results close to 90
percent making us wonder if such complex procedures had any significant impact.
Global-singular value decomposition (SVD) was used by researchers Zhang and Wang (2016) in
their quest to identify cucumber diseases. It proved to be effective in feature extraction. SVM,
coupled with an improved recognition method based on watershed algorithm helped them
achieve good and accurate results. The whole analysis was conducted on a small dataset due to
computational complexity of the techniques involved.
Feature extraction was done using SURF by Aravind et al. (2018) and then k-means was used for
clustering them. Grayscale image and its occurrence matrix with histogram served as classifier’s
input in the study. A variation of this study by Maniyath et al. (2018) involved the use of
Histogram of an Oriented Gradient (HOG) as feature extractor. This made use of Hu moments,
texture and color histogram for analysis. Both studies produced good results and supported the
use of random forest when the amount of data available for analysis is small.
In general, one can comprehend that all the above researches made use of relatively small
datasets for analysis and are heavily dependent on pre-processing for accurate classification. This
is computationally expensive and requires high domain knowledge [cite]. These seem to be the
main reasons why we are observing a paradigm shift in recent years as more and more work in
this field are making use of deep learning techniques instead of traditional machine learning
techniques. These are capable of addressing the bottlenecks and provide better results.
Studies using transfer learning include the research conducted by Shrivastava et al. (2019) on rice
plant. Here CNN architectures were used as feature extractors and output of these were used by
SVM for classification. Models produced accuracies of 92 which is commendable. However, the
dataset used was of size 619 which is too small for analysis. The limitations of these were
addressed making use of data augmentation by authors Coulibaly et al. (2019) which improved
the results to a certain extent . Using CNN as feature extractors can however be computationally
intensive. Further, the training time of models are not discussed by authors which could have
helped us understand their utilities better.
7
In research on rice plants by Lu et al. (2017), CNN with different number of layers were utilised to
identify and classify rice diseases. Although, the techniques achieved good accuracy, the tests
were performed on a small dataset of 500 images. Hence, the results obtained must taken with a
pinch of salt and further analysis must be done on the same.
An extension of the research was done by Ferentinos (2018) by making use of dataset of different
species of plants. Architectures like Alexnet, VGG, Googlenet which are based on CNN were
trained from scratch and resultant models obtained accuracies of about 99 .. In similar analysis by
Mohanty et al. (2016), transfer learning versions of Alexnet and Googlenet were used for
classification of various plant diseases resulting in accuracies similar to previous study. However,
both these models failed miserably when tested on datasets other than the ones under study. The
authors suggest that these can be mitigated using data augmentation and training some layers of
the architectures with problem specific data.
The current research aims to address the shortcomings of previous works in this area by making
use of data augmentation and transfer learning using trainable layers. The performance of new
algorithms like Gradient and XGboost shall be compared against these to find an optimal model
that is useful in identifying and classifying maize disease. The methodology and implementation of
these techniques is elaborated in further sections.
8
2.3 Conclusion
Chapter 3: Methodology
3.1 Introduction
Experimental Group S1 X S2
Control Group S3 C S4
Where :
9
C : Treatment of control group by using complex dataset
3.5.1 Tests
A test is a method of measuring performance in a given setting Brown 2004. The researcher will
use two kinds of test, the first one is applied as a pretest and the posttest. The test was training a
model without complex dataset and the data was captured. The posttest was done after
considering the complexity of the dataset introduced.
3.6 Conclusion
I am not going to approve Research Methodology Chapter with scant information on the
research methodology. Please redo this chapter and provide all the pertinent headings that
include the data source, the correct methodology, population (size of dataset) and steps
taken in cleaning the data (EDA).
For methodology since this is a purely Data Science problem use CRISP-DM, TDSP, KDD or
any other which was created for solving data science problem
11
Figure 4.1.1: Pr0p0sed m0del
Using Google colab as the environment of choice and python as the pr 0gramming language, this thesis
programmed ML algorithms namely RF, SVM and KNN in order to compare the models made from non
complex dataset and the complex dataset. The app imp0rted data from Kaggle plantVillage.
0.98
0.96
0.94
0.92
0.9
0.88
0.86
1unit 3units 5units 7units 10units
RF KNN SVM
13
Accuracy Complexity
RF KNN SVM
ordinary 0.98125 0.934375 0. 934375 0
3units 0.9625 0.9343 0.925 1
5units 0.978125 0.934375 0.925 1
7units 0.975 0.940625 0.9125 1
10units 0.96875 0.94375 0.940625 1
Chart Title
1.2
0.8
0.6
0.4
0.2
0
1 3 5 7 10
RF KNN SVM
We cross referenced the lighting and the background and the results are as follows
Accuracy Complexity
RF KNN SVM
ordinary ordinary 0.98125 0.934375 0. 934375 1
3units 0.9625 0.9343 0.925 1
5units 0.978125 0.934375 0.925 1
7units 0.975 0.940625 0.9125 1
14
10units 0.96875 0.94375 0.940625 1
black ordinary 0.97125 0.9421 0. 92437 1
3units 0.9525 0.95465 0.925 2
5units 0.928125 0.956375 0.925 2
7units 0.915 0.970625 0.9325 2
10units 0.905 0.98375 0.94036 2
red ordinary 0.961 0.92325 0. 932 1
3units 0.9725 0.9243 0.915 2
5units 0.978125 0.93652 0.912 2
7units 0.945 0.94252 0.9125 2
10units 0.9875 0.9362 0.940625 2
yellow ordinary 0.973 0.934 0. 934375 1
3units 0.9625 0.932 0.925 2
5units 0.972 0.934 0.925 2
7units 0.975 0.95 0.9125 2
10units 0.96875 0.955 0.91625 2
With the introduction of varying backgrounds and varying light we have seen the fluctuations in
performance meaning the accuracy is affected by the light, RF is negatively affected by the increase in light
as compared to KNN which seemingly increase in accuracy for the increase in lighting.
4.4 Conclusion
Based on the experiments done in an attempt to increase the complexity of the dataset inorder to
improve the practicality of the trained models, we have noticed that accuracy is greatly affected
by the increase of the complexity which was expected some algorithms are more sensitive than
others and some seems to gain accuracy instead of losing such as KNN.
15
Chapter 5: Conclusion and Future Work
5.1 Conclusion
Based on the results presented by in the previous chapter and the work that was done throughout
this study, we have seen that accuracy is greatly affected by the complexity of the dataset.
5.1.3 Recommendations
References
[1] J. Liu and X. Wang, “Plant diseases and pests detection based on deep learning: a review,” Plant
Methods, vol. 17, no. 1. BioMed Central Ltd, Dec. 01, 2021. doi: 10.1186/s13007-021-00722-9.
[2] F. Martinelli et al., “Advanced methods of plant disease detection. A review,” Agronomy for
Sustainable Development 2014 35:1, vol. 35, no. 1, pp. 1–25, Sep. 2014, doi: 10.1007/S13593-014-
0246-1.
[5] J. G. A. Barbedo, “Factors influencing the use of deep learning for plant disease recognition,”
Biosyst Eng, vol. 172, pp. 84–91, Aug. 2018, doi: 10.1016/j.biosystemseng.2018.05.013.
[6] V. Singh, N. Sharma, and S. Singh, “A review of imaging techniques for plant disease detection,”
Artificial Intelligence in Agriculture, vol. 4, pp. 229–242, Jan. 2020, doi:
10.1016/J.AIIA.2020.10.002.
16
[7] M. Z. Y. C. B Zhang, “Crop pest identification based on spatial pyramid pooling and deep
convolution neural network,” Trans Chin Soc Agric Eng, vol. 35, no. 19, pp. 209–215, 2019.
[8] S. P. S. G. S Kaur, “Plants disease identification and classification through leaf images: a survey,”
Arch Comput Methods Eng, vol. 26, no. 4, pp. 1–24, 2018.
[9] A. A. Bernardes et al., “Identification of foliar diseases in cotton crop,” Lecture Notes in
Computational Vision and Biomechanics, vol. 8, pp. 67–85, 2013, doi: 10.1007/978-94-007-0726-
9_4.
[10] V. B. Devi, R. Prabavathi, P. Subha, and M. Meenaloshini, “An Efficient and Robust Random Forest
Algorithm for Crop Disease Detection,” 2022 International Conference on Communication,
Computing and Internet of Things, IC3IoT 2022 - Proceedings, 2022, doi:
10.1109/IC3IOT53935.2022.9767937.
[11] M. Prabhakar, R. Purushothaman, and D. P. Awasthi, “Deep learning based assessment of disease
severity for early blight in tomato crop,” Multimed Tools Appl, vol. 79, no. 39–40, pp. 28773–
28784, Oct. 2020, doi: 10.1007/S11042-020-09461-W.
[12] A. Picon, M. Seitz, A. Alvarez-Gila, P. Mohnke, A. Ortiz-Barredo, and J. Echazarra, “Crop conditional
Convolutional Neural Networks for massive multi-crop plant disease classification over cell phone
acquired images taken on real field conditions,” Comput Electron Agric, vol. 167, Dec. 2019, doi:
10.1016/J.COMPAG.2019.105093.
[13] C. Yang, “Remote Sensing and Precision Agriculture Technologies for Crop Disease Detection and
Management with a Practical Application Example,” Engineering, vol. 6, no. 5, pp. 528–532, May
2020, doi: 10.1016/J.ENG.2019.10.015.
[14] M. Ouhami, A. Hafiane, Y. Es-Saady, M. el Hajji, and R. Canals, “Computer Vision, IoT and Data
Fusion for Crop Disease Detection Using Machine Learning: A Survey and Ongoing Research,”
Remote Sensing 2021, Vol. 13, Page 2486, vol. 13, no. 13, p. 2486, Jun. 2021, doi:
10.3390/RS13132486.
[15] Y. Zhao et al., “An effective automatic system deployed in agricultural Internet of Things using
Multi-Context Fusion Network towards crop disease recognition in the wild,” Applied Soft
Computing Journal, vol. 89, Apr. 2020, doi: 10.1016/J.ASOC.2020.106128.
17
[16] “Remote Sensing | Free Full-Text | Computer Vision, IoT and Data Fusion for Crop Disease
Detection Using Machine Learning: A Survey and Ongoing Research.”
https://www.mdpi.com/2072-4292/13/13/2486 (accessed Nov. 24, 2022).
[17] A. dos Santos Ferreira, D. Matte Freitas, G. Gonçalves da Silva, H. Pistori, and M. Theophilo Folhes,
“Weed detection in soybean crops using ConvNets,” Comput Electron Agric, vol. 143, pp. 314–324,
Dec. 2017, doi: 10.1016/J.COMPAG.2017.10.027.
[21] T. van Klompenburg, A. Kassahun, and C. Catal, “Crop yield prediction using machine learning: A
systematic literature review,” Comput Electron Agric, vol. 177, Oct. 2020, doi:
10.1016/J.COMPAG.2020.105709.
[22] T. Domingues, T. Brandão, and J. C. Ferreira, “Machine Learning for Detection and Prediction of
Crop Diseases and Pests: A Comprehensive Survey,” Agriculture, vol. 12, no. 9, p. 1350, Sep. 2022,
doi: 10.3390/agriculture12091350.
[23] J. Liu, I. Abbas, and R. S. Noor, “Development of Deep Learning-Based Variable Rate Agrochemical
Spraying System for Targeted Weeds Control in Strawberry Crop,” Agronomy 2021, Vol. 11, Page
1480, vol. 11, no. 8, p. 1480, Jul. 2021, doi: 10.3390/AGRONOMY11081480.
[23] J. Liu and X. Wang, “Plant diseases and pests detection based on deep learning: a review,” Plant
Methods, vol. 17, no. 1. BioMed Central Ltd, Dec. 01, 2021. doi: 10.1186/s13007-021-00722-9.
18
[24] F. Martinelli et al., “Advanced methods of plant disease detection. A review,” Agronomy for
Sustainable Development 2014 35:1, vol. 35, no. 1, pp. 1–25, Sep. 2014, doi: 10.1007/S13593-014-
0246-1.
[27] J. G. A. Barbedo, “Factors influencing the use of deep learning for plant disease recognition,”
Biosyst Eng, vol. 172, pp. 84–91, Aug. 2018, doi: 10.1016/j.biosystemseng.2018.05.013.
[28] V. Singh, N. Sharma, and S. Singh, “A review of imaging techniques for plant disease detection,”
Artificial Intelligence in Agriculture, vol. 4, pp. 229–242, Jan. 2020, doi:
10.1016/J.AIIA.2020.10.002.
[29] M. Z. Y. C. B Zhang, “Crop pest identification based on spatial pyramid pooling and deep
convolution neural network,” Trans Chin Soc Agric Eng, vol. 35, no. 19, pp. 209–215, 2019.
[30] S. P. S. G. S Kaur, “Plants disease identification and classification through leaf images: a survey,”
Arch Comput Methods Eng, vol. 26, no. 4, pp. 1–24, 2018.
[31] A. A. Bernardes et al., “Identification of foliar diseases in cotton crop,” Lecture Notes in
Computational Vision and Biomechanics, vol. 8, pp. 67–85, 2013, doi: 10.1007/978-94-007-0726-
9_4.
[32] V. B. Devi, R. Prabavathi, P. Subha, and M. Meenaloshini, “An Efficient and Robust Random Forest
Algorithm for Crop Disease Detection,” 2022 International Conference on Communication,
Computing and Internet of Things, IC3IoT 2022 - Proceedings, 2022, doi:
10.1109/IC3IOT53935.2022.9767937.
[33] M. Prabhakar, R. Purushothaman, and D. P. Awasthi, “Deep learning based assessment of disease
severity for early blight in tomato crop,” Multimed Tools Appl, vol. 79, no. 39–40, pp. 28773–
28784, Oct. 2020, doi: 10.1007/S11042-020-09461-W.
[34] A. Picon, M. Seitz, A. Alvarez-Gila, P. Mohnke, A. Ortiz-Barredo, and J. Echazarra, “Crop conditional
Convolutional Neural Networks for massive multi-crop plant disease classification over cell phone
19
acquired images taken on real field conditions,” Comput Electron Agric, vol. 167, Dec. 2019, doi:
10.1016/J.COMPAG.2019.105093.
[35] C. Yang, “Remote Sensing and Precision Agriculture Technologies for Crop Disease Detection and
Management with a Practical Application Example,” Engineering, vol. 6, no. 5, pp. 528–532, May
2020, doi: 10.1016/J.ENG.2019.10.015.
[36] M. Ouhami, A. Hafiane, Y. Es-Saady, M. el Hajji, and R. Canals, “Computer Vision, IoT and Data
Fusion for Crop Disease Detection Using Machine Learning: A Survey and Ongoing Research,”
Remote Sensing 2021, Vol. 13, Page 2486, vol. 13, no. 13, p. 2486, Jun. 2021, doi:
10.3390/RS13132486.
[37] Y. Zhao et al., “An effective automatic system deployed in agricultural Internet of Things using
Multi-Context Fusion Network towards crop disease recognition in the wild,” Applied Soft
Computing Journal, vol. 89, Apr. 2020, doi: 10.1016/J.ASOC.2020.106128.
[38] “Remote Sensing | Free Full-Text | Computer Vision, IoT and Data Fusion for Crop Disease
Detection Using Machine Learning: A Survey and Ongoing Research.”
https://www.mdpi.com/2072-4292/13/13/2486 (accessed Nov. 24, 2022).
[39] A. dos Santos Ferreira, D. Matte Freitas, G. Gonçalves da Silva, H. Pistori, and M. Theophilo Folhes,
“Weed detection in soybean crops using ConvNets,” Comput Electron Agric, vol. 143, pp. 314–324,
Dec. 2017, doi: 10.1016/J.COMPAG.2017.10.027.
20
[43] T. van Klompenburg, A. Kassahun, and C. Catal, “Crop yield prediction using machine learning: A
systematic literature review,” Comput Electron Agric, vol. 177, Oct. 2020, doi:
10.1016/J.COMPAG.2020.105709.
[44] T. Domingues, T. Brandão, and J. C. Ferreira, “Machine Learning for Detection and Prediction of
Crop Diseases and Pests: A Comprehensive Survey,” Agriculture, vol. 12, no. 9, p. 1350, Sep. 2022,
doi: 10.3390/agriculture12091350.
[45] J. Liu, I. Abbas, and R. S. Noor, “Development of Deep Learning-Based Variable Rate Agrochemical
Spraying System for Targeted Weeds Control in Strawberry Crop,” Agronomy 2021, Vol. 11, Page
1480, vol. 11, no. 8, p. 1480, Jul. 2021, doi: 10.3390/AGRONOMY11081480.
21