0% found this document useful (0 votes)
9 views49 pages

Plant Disease Detection

The project report details the classification of plant leaf images into defective and non-defective stages, focusing on detecting diseases in tomato leaves using machine learning models, specifically Convolution Neural Networks (CNN) and K-nearest Neighbors (KNN). The study evaluates the models based on accuracy, precision, recall, and F1-score, with results indicating that CNN outperformed KNN. Additionally, a user study assessed farmers' trust in AI and explainable AI (XAI) models, revealing a general distrust but providing insights for future improvements.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views49 pages

Plant Disease Detection

The project report details the classification of plant leaf images into defective and non-defective stages, focusing on detecting diseases in tomato leaves using machine learning models, specifically Convolution Neural Networks (CNN) and K-nearest Neighbors (KNN). The study evaluates the models based on accuracy, precision, recall, and F1-score, with results indicating that CNN outperformed KNN. Additionally, a user study assessed farmers' trust in AI and explainable AI (XAI) models, revealing a general distrust but providing insights for future improvements.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

PLANT LEAF DISEASE DETECTION

Classification of Leaf image into


Defective and Non-Defective stages of leaf.
A PROJECT REPORT

Submitted by

JOSE RONALDO A – 312320205061


KEVIN HARRIS D - 312320205069

of

BACHELOR OF TECHNOLOGY

in

INFORMATION TECHNOLOGY

St. JOSEPH’S COLLEGE OF ENGINEERING


(An Autonomous Institution)

St. Joseph’s Group of Institutions


Jeppiaar Educational Trust
OMR, Chennai 600 119

ANNA UNIVERSITY: CHENNAI

March-2023

(i)
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “PLANT DISEASE DETECTION”


is the bonafide work of JOSE RONALDO A (312320205061) and
KEVIN HARRIS D (312320205069) who carried out the project under
my supervision.

SIGNATURE SIGNATURE
Project Guide, Head of the department-Lab Affairs,
Ms.K.Priyadharshini, M.E., Dr. V Muthulakshmi, M.E.,Ph.D,
Assistant Professor,
Department of IT, Associate Professor,
St.Joseph’s College of Engineering, Department of IT,
OMR, Chennai- 600119. St.Joseph’s College of Engineering,
OMR, Chennai- 600119.

(ii)
CERTIFICATE OF EVALUATION

COLLEGE NAME : St. Joseph’s College of Engineering, Chennai-600119.


BRANCH : B.TECH., IT (Information Technology)
SEMESTER : VI

SL. NAME OF TITLE OF THE NAME OF THE


NO THE PROJECT SUPERVISOR WITH
STUDENT DESIGNATION

1 JOSE RONALDO A Mrs.K.Priyadharshini,


PLANT LEAF
(312320205061) DISEASE M.E., (Ph.D).,

DETECTION ASSISTANT
2 PROFESSOR
KEVIN HARRIS D

(312320205069)

The report of the project work submitted by the above students in partial fulfillment
for the award of Bachelor of Technology Degree in Information Technology of Anna
University was confirmed to be reportof the work done by the above students and
then evaluated.

Submitted to Project and Viva Examination held on .

INTERNAL EXAMINER EXTERNAL EXAMINER


iii
ABSTRACT
The transmission of diseases from unhealthy to healthy plants is one of the most

disastrous threats to the agriculture industry. Diseases transferred spread like wild

fire and have the potential to infest the whole farm if not detected early. Plant disease

detection methods aid in identifying infected plants in their very early stages and also

help the user in scaling the identification of plant diseases to a variety of plants in a

cost-effective manner. The aim of this thesis is to implement two different machine

learning models, namely, Convolution Neural Networks (CNN) and K-nearest

Neighbors (KNN) for the application of plant disease detection in tomato leaves. The

two machine learning models were evaluated on four different metrics in order to

find the best performing model among the two. The four different metrics were,

Accuracy, Precision, Recall and F1-Score. Other than identifying the diseases using

the aforementioned machine learning models, this study also focused on providing

explainability to the predictions made by the respective models using the Explainable

Artificial Intelligence technique, Local Interpretable Model-agnostic Explanations

(LIME). In vein of collecting domain specific expertise, a user study was

implemented in which the user trust of the AI and XAI models were evaluated and

feedback from farmers were collected in order to provide recommendations for

future research. The results on implementing the machine learning models showed

that the CNN model performed better than the KNN model in all of the four

evaluation metrics and the results from the user study signify that the farmers do not
iv
trust the AI and XAI models, however, the user study through the feedback collected

from the farmers helps identify areas in which the trust of the farmers can be grown

and strengthened.

v
TABLE OF CONTENTS

1.INTRODUCTION________________________________________________6
1.1 ARTIFICIAL NEURAL NETWORKS_________________________6
1.2 CONVOLUTION NEURAL NETWORKS_____________________7
1.3 K-NEAREST NEIGHBOURS________________________________8
1.4 HYPOTHESIS____________________________________________9
1.5 OBJECTIVES____________________________________________9
2.LITERATURE REVIEW_________________________________________10
3.SYSTEM ANALYSIS____________________________________________12
3.1 SELECTION OF METHODS AND METRICES________________12
3.2 LIBRARIES USED_______________________________________12
3.3 CONFIGURATIONS______________________________________14
3.4 XAI MODEL____________________________________________15
3.5 USER STUDY___________________________________________16
4.SYSTEM DESIGN_______________________________________________17
4.1 CLASS DIAGRAM_______________________________________17
4.2 SEQUENCE DIAGRAM___________________________________18
4.3 ACTIVITY DIAGRAM____________________________________19
4.4 DATA FLOW DIAGRAM_________________________________21
5.RESULTS______________________________________________________22
5.1 RESULTS FROM CLASSIFICATION MODELS_______________22
5.2 RESULTS FROM XAI MODEL_____________________________23
5.3 USER STUDY___________________________________________24
6.CONCLUSIONS_________________________________________________25
6.1 FUTURE WORKS________________________________________25

vi
APPENDIX 1_____________________________________________________26
APPENDIX 2_____________________________________________________40
REFERENCES____________________________________________________45

vii
CHAPTER 1 : INTRODUCTION

For a long time, the agriculture industry has used modern science to meet the
food demands of 7 billion people. However, there are numerous threats that people
working in the agriculture industry face that threaten the food security of the human
society. Some of the threats as we know include, climate change, livestock grazing,
plant diseases, etc, (Food and Agriculture Organization of the United Nations,
2017). Among the many threats, the effect of plant disease is truly momentous as
it not only causes huge wastage of plants for human consumption but it also
immensely affects the health of the human society and the lives of the farmers
whose main source of income is from their production of healthy crops (Al-Sadi,
2017; Somowiyarjo, 2011). During the process of plant harvesting, human experts
go through a tedious process of checking and removing mature plants, making sure
they aren’t affected by any disease and are suitable for human consumption.
However, this traditional visual process of identifying the name of the disease a
particular plant is suffering from consumes a lot of time and is expensive especially
if the farmhouse is big and there are a lot of plants (Gavhale and Gawande, 2014).
Furthermore, with the apparent increase of population in the world day by day it is
only practical that this process is automated so that the growing demands of the
people can be met.

1.1 ARTIFICIAL NEURAL NETWORKS


Artificial Neural Networks, abbreviated as ANNs can be described
as computational systems that are designed to replicate the human brain’s analysis
and information processing skills, and just like the human brain an ANN consists
of a directed graph with interconnected processing elements known as neurons
(Jain, Mao and Mohiuddin, 1996). There are different types of neural networks
8
such as the Recurrent Neural Network (RNN), Multilayer Perceptron (MLP),
Convolutional Neural Network (CNN), etc. Among the many Neural Networks
present the most common one is the Multilayer Perceptron network (MLP). A
typical MLP network comprises of different node layers, namely, an input layer,
one or more hidden layers and finally an output layer. All of the nodes are
connected to each other and each node is associated with a weight and threshold.
The weight refers to the importance that two nodes connecting have in a network
and data is transferred from one layer to another only if the output of an individual
node is above the threshold specified (Jain, Mao and Mohiuddin, 1996). An ANN’s
training process involves recognizing patterns in the data that is fed into the
network, during this supervised learning phase the actual and the desired output of
the ANN are compared (Cost function) and the training process is presented a
number of times (epochs) until there is little or no difference between the actual
and desired output.

1.2 CONVOLUTION NEURAL NETWORKS


Regular Neural Networks such as the Multi-layer Perceptron (MLP),
in the past were used for image classification purposes however as the
resolution of the images being used to classify became higher and higher
the networks became computationally hard to deal with and the number of
total parameters used for classification would be far too many.
Convolutional Neural Networks are very similar in working to regular
neural networks such as the MLP, however, what changes in Convolutional
Neural Networks is that the layers of a CNN have three-dimensional
arrangement (width, height and depth) of neurons instead of the standard
two-dimensional array and for this simple reason CNNs are widely used on
image data for the purpose of classification as the architecture of a CNN is
9
designed to take advantage of the 3d form of an image. A simple
Convolutional Neural Network’s architecture consists of three main layers,
namely, Convolutional layer, Pooling layer and the Fully connected layer.
The Convolutional layer is regarded as the main building block of a CNN,
it consists of learnable parameters known as filters/kernels. The filter is
responsible for finding patterns (textures, edges, shapes, objects, etc) in the
input image. Each filter slides/convolves over the height and width of the
input image, computing the dot product between the filter and the pixels
present in the input image. The resultant of a Convolutional layer is a feature
map that summarizes all the features found in the input image (Yamashita
et al., 2018).

1.3 K-NEAREST NEIGHBOURS


The K-nearest Neighbor is a machine learning algorithm that estimates
how likely a new sample belongs to a label, based on the voting of the majority
labels that the data points nearer to the new sample are found in. The algorithm
works by calculating the distance between the new sample and all the other samples
after which it sorts the values calculated in ascending order. Based on the ‘k’ value
the majority label is chosen as the prediction. Here ‘k’ signifies the number of
nearest neighbors to be included in voting of the majority label (Guo et al., 2003).
The choice of ‘k’ in KNN makes or breaks the predictions in the algorithm. If the
value of ‘k’ is too small then the classification is very vulnerable to noise and the
model might be overfitted. On the other hand, if the ‘k’ value is too big then the
model most likely will classify any new sample as the majority label all the time
(Paryudi, 2019). KNN is regarded as one of the most simple and intuitive
algorithms for classification and it belongs to the class of lazy learner algorithms
that do not perform any generalization on the training data until a query on the
dataset is made to the system (Guo et al., 2003)..
10
1.4 HYPOTHESIS

The study hypothesises that the CNN model may perform better than the
KNN model as CNNs are known to perform well with a large amount of data and
on image classification use cases. With regards to the user study, the study
hypothesises that the farmers may trust the implemented AI and XAI model’s
predictions and explanations due to the positive feedbacks it received in Malhi et
al. (2019) and in comparison to the two models, it is also hypothesised that the
farmers may trust KNN with LIME more than CNN with LIME due to its simplicity
and the distrust the CNN with LIME due to its complexity.

1.5 OBJECTIVES
1. Research the related problem area
2. Implement the pre-processing on the dataset
3. Implement CNN and KNN
4. Train and test CNN
5. Train and test KNN
6. Evaluate the performance of both the models
7. Apply XAI technique on both the models in order to find the features
responsible for the prediction
8. Design and execute a user study

11
CHAPTER 2
LITERATURE REVIEW
[1] In the paper ―Deep learning for Image-Based Plant detection”
the authors Prasanna Mohanty et al., has proposed an approach to detect
disease in plants by training a convolutional neural network. The CNN model
is trained to identify healthy and diseased plants of 14 species. The model
achieved an accuracy of 99.35% on test set data. When using the model on
images procured from trusted online sources, the model achieves an accuracy
of 31.4%, while this is better than a simple model of random selection, a more
diverse set of training data can aid to increase the accuracy. Also some other
variations of model or neural network training may yield higher accuracy, thus
paving path for making plant disease detection easily available to everyone.
[2] Malvika Ranjan et al. in the paper ―Detection and Classification
of leaf disease using Artificial Neural Network” proposed an approach to
detect diseases in plant utilizing the captured image of the diseased leaf.
Artificial Neural Network (ANN) is trained by properly choosing feature
values to distinguish diseased plants and healthy samples. The ANN model
achieves an accuracy of 80%.
[3] According to paper ―Detection of unhealthy region of plant
leaves and classification of plant leaf diseases using texture features”] by S.
Arivazhagan, disease identification process includes four main steps as
follows: first, a color transformation structure is taken for the input RGB
image, and then by means of a specific threshold value, the green pixels are
European Journal of Molecular & Clinical Medicine ISSN 2515-8260 Volume
7, Issue 07, 2020 1607 detected and uninvolved, which is followed by
segmentation process, and for obtaining beneficial segments the texture
statistics are computed. At last, classifier is used for the features that are
12
extracted to classify the disease..
[4] Kulkarni et al. in the paper ―Applying image processing
technique to detect plant diseases” , a methodology for early and accurately
plant diseases detection, using artificial neural network (ANN) and diverse
image processing techniques. As the proposed approach is based on ANN
classifier for classification and Gabor filter for feature extraction, it gives better
results with a recognition rate of up to 91%.
[5] In paper ―Plant disease detection using CNN and GAN” , by
Emaneul Cortes, an approach to detect plant disease using Generative
Adversarial networks has been proposed. Background segmentation is used for
ensuring proper feature extraction and output mapping. It is seen that using
Gans may hold promise to classify diseases in plants, however segmenting
based on background did not improve accuracy. In the paper ―Convolutional
Neural Network based Inception v3 Model for Animal Classification‖ ,
[6] Jyotsna Bankar et al. have proposed use of inception v3 model
in classifying animals in different species. Inception v3 can be used to classify
objects as well as to categorize them, this capability of inception v3 makes it
instrumental in various image classifiers.

13
CHAPTER 3
SYSTEM ANALYSIS

3.1 SELECTION OF METHODS AND METRICES


Plant disease detection can be performed using different classifiers and
a multitude of techniques have been used in the past for this purpose. In this thesis,
the classifiers that were used for performing the detection were the Convolution
Neural Network (CNN) and K-nearest Neighbor (KNN). The XAI technique used
for providing explainability for the predictions made by the classifiers was Local
Interpretable Model-agnostic Explanations (LIME). The evaluation of the
aforementioned models was done using the following metrics: accuracy, precision,
recall and f1-score. Each classifier was evaluated using the same four evaluation
metrics and the results of both the classifiers were used to find the best performing
model on the disease detection of tomato leaves from the plant village dataset.

3.2 LIBRARIES USED

- Libraries used in the implementation of CNN and KNN


NumPy Is a library in python programming language that helps in
managing large, multidimensional arrays and matrices. This library also enables
the operation of mathematical functions on the created arrays or matrices
Cv2 Is an open source computer vision and machine learning library that
is used to solve computer vision problems such as reading or resizing an image
Sklearn Is a open source machine learning library that includes various
classification and regression algorithms such as, K-nearest Neighbors, Support
vector machines, random forests, etc. This machine learning library was introduced
by keras
Keras Is an open source neural network library that is designed to enable
the fast implementation of deep neural networks with in the python interface itself

14
Matplotlib Is a plotting library used in python for creating static and
animated visualizaitions Lime Is an open source library in python programming
language that is used to implement the LIME model that explains black box
classifier.

15
3.3 CONFIGURATIONS
The architecture of CNN used for plant disease detection in this
paper was as follows, the first block contains a Convolutional layer with 32 filters
of size 3 x 3 and the activation function used was the ReLU activation function.
We then follow the operation by performing batch normalization, and choosing the
Max Pooling layer with a pool size of (3,3) and adding a dropout layer with 25%
dropout. Batch normalization was performed in order to speed up the convergence
of the neural network, it is generally applied after each individual layer so that the
output of the previous layer can be normalized allowing for each individual layer
present in the network to perform learning independently (Garbin, Zhu and
Marques, 2020). Dropout layer is a technique used to prevent the model from
overfitting by randomly switching off some sections of the neurons. When some
sections of the neurons are switched off the incoming as well as the outgoing
connections from the neurons are also switched off and this results in the betterment
of the model in learning and allows for the model to not generalize to the test dataset
(Garbin, Zhu and Marques, 2020). The second block in the network contains two
convolutional layers with 64 filters of size 3 x 3 with ReLU activation function and
batch normalization. After which, Max Pooling layer with pool size (2,2) and a
dropout layer with 25% dropout was added. The third block in the network contains
two convolutional layers with 128 filters of size 3 x 3 with ReLU activation
function and batch normalization. After which similar to the second block, a Max
Pooling layer with pool size (2,2) and dropout layer with 25% dropout was added.
To complete the model we now move on to constructing our Fully connected layer
but before that, a flattening operation was performed to convert the 3D output of
the last convolution layer into 1D form before feeding it into the Fully connected
layers for classification. For the classification of the features extracted from the last
convolution and pooling layer, a dense layer was added with 1024 neurons with
16
ReLU activation17 function, batch normalization and dropout layer of 5% dropout
to improve the results of the model. The Final layer in the neural network was the
logits layer which will return the values of the predictions made by the model and
for this a dense layer with 10 neurons (as there are 10 classes) with Softmax
activation function was used. The Softmax activation was used in this layer as we
are using a multi-label classification model

3.4 XAI MODEL


We make use of the lime image package to implement our LIME
model and at first, we create an object ‘explainer’. This object makes use of
the method explain_instance() that takes in 3D image data and the model’s
predictor function (model.predict). Based on the prediction from the trained
model the explanations are returned. As mentioned earlier, once the images
are perturbated, the trained black-box model is used by LIME to predict
each one of the artificially created images and weights are assigned based
on the proximity of perturbated images to the image 21 of interest. After
assigning weights to each one of the perturbated images a linear classifier is
used to find the most important features in the image. In order to visualize
the explanations. The method ‘get_image_and_mask()’ was used. This
method takes in the 3D image data passed earlier and returns (image, mask)
tuple. Where ‘image’ refers to the 3D numpy array representing the image
data and ‘mask’ refers to the 2D numpy array representing the features in
the image responsible for the prediction made by the model.

3.5 USER STUDY


A survey style user study was incorporated in order to enquire and collect
17
feedback from professionals in the domain (farmers) on the trustworthiness of the
implemented AI and XAI model’s predictions and explanations. The survey
contains questionnaires and short text fields. The questionnaire was used to verify
if whether the users find the predictions of each of the AI models and explanations
provided by the XAI model to be trustworthy, consistent and comprehensible and
the short text fields were used in order to collect additional feedback from the users.
In order to measure the agreement of each question in the questionnaire with the
user the Likert scale approach was adopted and this approach was 23 adopted due
to its popularity and value that it brings to a user study in quantifying a particular
user’s opinion (Bishop and Herron, 2015). The Likert scale used in this study had
the following pointers to collect responses from the users, 1. Strongly disagree 2.
Disagree 3. Neither agree nor disagree 4. Agree 5. Strongly agree The primary
objective of this user study is to evaluate user’s trust for the implemented AI and
XAI models and since there is no well-established methodology to evaluate XAI
systems the metrics used in this study were adopted from Khodabandehloo et al.
(2021). The adopted evaluation metrics were, • Human-machine task performance
(HMTP), which measures if whether the use of the tool makes the end-users more
successful in their task • Explanation satisfaction (ES), which measures the end-
user’s satisfaction and understandability of the machine’s explanations • User trust
and reliance (UTR), which measures the end-user’s trust and reliance in the
machine’s explanations Using the above-mentioned metrics and Likert scale
approach the user study was designed. The user study was split into three main
sections, the first section contained the AI model’s predictions without any
explanations from the XAI model and the second section contained the AI model’s
predictions .

18
CHAPTER 4
SYSTEM DESIGN
INTRODUCTION
Software design is a process of problem-solving and planning for a
software solution. After the purpose and specifications of software is
determined, software developers will design or employ designers to develop a
plan for a solution. It includes construction component and algorithm
implementation issues which shown in as the architectural view. During this
chapter we will introduce some principles that are considered through the
Software design.

4.1 Class Diagram

A class diagram is a picture for describing generic descriptions of


possible systems. Class diagrams and collaboration diagrams are alternate
representations of object models. Class diagrams contain classes and object
diagrams contain objects, but it is possible to mix classes and objects when
dealing with various kinds of metadata, so the separation is not rigid.

19
4.2 Sequence Diagram

A sequence diagram shows object interactions arranged in time


sequence. It depicts the objects and classes involved in the scenario and the
sequence of messages exchanged between the objects needed to carry out the
functionality of the scenario.

20
4.3 ACTIVITY DIAGRAM
Activity diagrams are graphical representations of workflows of stepwise
activities and actions with support for choice, iteration and concurrency. In the
Unified Modelling Language, activity diagrams can be used to describe the
business and operational step-by-step workflows of components in a system.
An activity diagram shows the overall flow of processes.

21
22
4.4 Data Flow Diagram

A data flow diagram (DFD) is a graphical representation of the "flow"


of data through an information system, modeling its process aspects. Often
they are a preliminary step used to create an overview of the system which
can later be elaborated. DFDs can also be used for the visualization of data
processing (structured design).

A DFD shows what kinds of information will be input to and output from the
system, where the data will come from and go to, and where the data will be
stored.

It does not show information about the timing of processes, or information


about whether processes will operate in sequence or in parallel.

23
CHAPTER 5
RESULTS
5.1 RESULTS FROM CLASSIFICATION MODELS
Convolution Neural Network Table 6 - CNN evaluation metrics
Accuracy (%) Precision (%) Recall (%) F1-Score (%) 98.5 93 93 93 K-nearest
Neighbor Table 7 - KNN evaluation metrics Accuracy (%) Precision (%) Recall
(%) F1-Score (%) 83.6 90 84 86 The following results present the Accuracy,
Precision, Recall and F1-Score of both the CNN and KNN model. In order to
understand the concept of the abovementioned evaluation metrics, it is important
to understand what True positives (TP), True negatives (TN), False positives (FP)
and False negatives (FN) are. For example for an image of a tomato leaf that
contains the disease ‘Early blight’, the confusion matrix looks like as shown in Fig
4. The confusion matrix shown below is for one of the 10 categories/diseases.
Actual disease Early blight Not Early blight Predicted Early blight TP FP Disease
Not Early blight FN TN Fig 4 - Confusion matrix for ‘Early blight’ If the model
correctly predicts the image of the plant as containing the disease then the outcome
is known as a True positive (TP) outcome. If the model correctly predicts the image
of the plant as not containing the disease then the outcome is known as True
negative (TN) outcome. If the model incorrectly predicts the image of the plant as
containing the disease then the outcome is known as a False positive (FP) outcome.
If the model incorrectly predicts the image of the plant as not containing the disease
then the outcome is known as a False negative (FN) outcome.

24
5.2 RESULTS FROM XAI MODEL
Probabilities produced by LIME The output produced by LIME in order to
explain the CNN model’s predictions can be seen above. Parts that decrease the
probability (red) of the image belonging to ‘Late blight’. 29 Implementation of
LIME of KNN .The same image is fed into LIME in order to explain the KNN
model’s predictions and just as seen for in the CNN model, showcases the features
responsible for the KNN model’s prediction Parts that decrease the probability
(red) of the image belonging to ‘Late blight’.
As discussed earlier in the study, we known that LIME creates
simulated data around the original prediction through the process known as
perturbation in its process to generate explanations for a single prediction. The
simulated data created through the process of perturbation is done randomly, which
means that every time LIME is executed the explanations returned for a single
prediction will be different. This instability in LIME was found while testing the
XAI model numerous times trying to return explanations for a single prediction and
can also be seen in LIME’s explanations for CNN’s predictions in the survey
(attached in Appendix B). In user trust where stability is of utmost importance,
LIME’s instability is a critical issue that makes its explanations not trust worth to
the users. A research conducted by Zafar and N. Khan (2019) also mentions this
instability in LIME and proposes an alternative method known as Deterministic
Local Interpretable Model-Agnostic Explanations (DLIME) that uses hierarchical
clusteing to group the training data and makes use of KNN instead of a linear
classifier to return explanations. The study makes use of the health care dataset
from the UCI repository. The results from the study signify that even though
DLIME is successful in producing stable explanations, it’s stability depends on the
number of samples present in the dataset. The study further adds that in the future
it looks forward to solving this issue while also maintaing the stability of model
25
explanations and experiment with other data types such as image data and text data .

5.3 USER STUDY


The user study received a total of 7 responses from farmers present in the
“World Agriculture group for farmers” Facebook group and the responses for each
question are visualised using the plot_likert package and shown below. The work
experience of all the 7 farmers were between 5-10 years, which makes their
feedback valuable and important. Comparing CNN with LIME and KNN with
LIME, on observing the user trust reliance score from Table 8, we can see that upon
showing the results to the farmers that they trust the CNN with LIME model more
than the KNN with LIME model as the CNN with LIME model’s score is higher.
With regards to explanation satisfaction and human-machine task performance the
KNN with LIME model is what the farmers seem to prefer from looking at Table
8. However, based on the feedback received and looking at the Likert scale
visualisations, it can be observed that there aren’t any farmers who’ve agreed or
strongly agreed with the any of the UTR questions asked. Especially based on the
feedback received it is clear that the LIME’s explanations for both the models is
not adequate enough to gain the trust of the farmers completely. Therefore, to
answer our second research question, No, the farmers do not find the explanations
provided by LIME for the respective AI model’s predictions easy to trust. The
reasons for the above results from the user study are discussed in the next section
in three parts as user study limitations, XAI model and analysis of results from the
user study.

26
CHAPTER 6

6.CONCLUSIONS
On implementing two machine learning models, Convolutional
Neural Networks (CNN) and K-nearest Neighbors (KNN) on the disease detection
of tomato leaves from the plant village dataset and also evaluating the
aforementioned model using the following metrics: Accuracy, Precision, Recall
and F1-Score, the study shows that CNN model performs better than the KNN
model in the plant disease detection of tomato leaves by outperforming the KNN
model is all of the four evaluation metrics. The study also makes use of the XAI
technique Local Interpretable Model-agnostic Explanations (LIME) in order to
provide explainability to the predictions made by the models. With the execution
of a user study, this study is able to get feedback from farmers on if they trust the
aforementioned AI and XAI models. The results from the user study indicate that
the farmers find the predictions and explanations from AI and XAI models
inadequate and therefore do not trust the implemented tools for the detection of
plant diseases. However, through additional feeback the farmers highlight areas
that could possibly help improve and trust the AI and XAI models .

6.1 FUTURE WORKS


The dataset used in this study makes use of only tomato leaves from
the plant village dataset. Due to the lack of enough Random Access Memory
(RAM) storage the study had to be limited to only 10,000 images. In the future, it
would be great to test the implementation of both the CNN and KNN model and
also use LIME on the whole plant village dataset containing multiple different
plants in order to bring detection and explainability to a wide variety of plants.

27
Another work that this study would like to pursue in the future is to provide a
comparative study on different XAI techniques and implement a user study in order
to find out which XAI technique provides the best explainability, transparency and
interpretability. With the addition of data on Volatile organic compounds, soil
types, environmental conditions and time of the month as mentioned by farmers
through feedback from the user study, the user trust of the detection tool is expected
to grow a little higher. As discussed earlier in the use case of this study, a working
application that is capable of taking pictures of plants and detecting plant diseases
in real-time is the ideal goal and will prove to be of great use to the farmers and
botany enthusiasts.

APPENDIX 1
SAMPLE CODE
CNN.PY :
import cv2
import numpy as np
import os
from random import shuffle
from tqdm import tqdm

TRAIN_DIR = 'train/train'
TEST_DIR = 'test/test'
IMG_SIZE = 50
LR = 1e-3
MODEL_NAME = 'healthyvsunhealthy-{}-{}.model'.format(LR, '2conv-basic')

28
def label_img(img):
word_label = img[0]
if word_label == 'h':
return [1, 0, 0, 0]
elif word_label == 'b':
return [0, 1, 0, 0]
elif word_label == 'v':
return [0, 0, 1, 0]
elif word_label == 'l':
return [0, 0, 0, 1]

def create_train_data():
training_data = []
for img in tqdm(os.listdir(TRAIN_DIR)):
label = label_img(img)
path = os.path.join(TRAIN_DIR, img)
img = cv2.imread(path, cv2.IMREAD_COLOR)
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
training_data.append([np.array(img), np.array(label)])
shuffle(training_data)
np.save('train_data.npy', training_data)
return training_data

def process_test_data():
testing_data = []
for img in tqdm(os.listdir(TEST_DIR)):
path = os.path.join(TEST_DIR, img)
29
img_num = img.split('.')[0]
img = cv2.imread(path, cv2.IMREAD_COLOR)
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
testing_data.append([np.array(img), img_num])

shuffle(testing_data)
np.save('test_data.npy', testing_data)
return testing_data

train_data = create_train_data()

import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
import tensorflow as tf
from tensorflow.python.framework import ops

ops.reset_default_graph()

convnet = input_data(shape=[None, IMG_SIZE, IMG_SIZE, 3], name='input')

convnet = conv_2d(convnet, 32, 3, activation='relu')


convnet = max_pool_2d(convnet, 3)

convnet = conv_2d(convnet, 64, 3, activation='relu')


30
convnet = max_pool_2d(convnet, 3)

convnet = conv_2d(convnet, 128, 3, activation='relu')


convnet = max_pool_2d(convnet, 3)

convnet = conv_2d(convnet, 32, 3, activation='relu')


convnet = max_pool_2d(convnet, 3)

convnet = conv_2d(convnet, 64, 3, activation='relu')


convnet = max_pool_2d(convnet, 3)

convnet = fully_connected(convnet, 1024, activation='relu')


convnet = dropout(convnet, 0.8)

convnet = fully_connected(convnet, 4, activation='softmax')


convnet = regression(convnet, optimizer='adam', learning_rate=LR,
loss='categorical_crossentropy', name='targets')

model = tflearn.DNN(convnet, tensorboard_dir='log')

if os.path.exists('{}.meta'.format(MODEL_NAME)):
model.load(MODEL_NAME)
print('model loaded!')

train = train_data[:-500]
test = train_data[-500:]

31
X = np.array([i[0] for i in train]).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
Y = [i[1] for i in train]

test_x = np.array([i[0] for i in test]).reshape(-1, IMG_SIZE, IMG_SIZE, 3)


test_y = [i[1] for i in test]

model.fit({'input': X}, {'targets': Y}, n_epoch=8, validation_set=({'input': test_x},


{'targets': test_y}),
snapshot_step=40, show_metric=True, run_id=MODEL_NAME)

model.save(MODEL_NAME)

UI.PY :
import tkinter as tk
from tkinter.filedialog import askopenfilename
import shutil
import os
import sys
from PIL import Image, ImageTk

window = tk.Tk()

window.title("Dr. Plant")

window.geometry("500x510")
window.configure(background="lightgreen")

32
title = tk.Label(text="Click below to choose picture for testing disease....",
background = "lightgreen", fg="Brown", font=("", 15))
title.grid()
def bact():
window.destroy()
window1 = tk.Tk()
window1.title("Dr. Plant")
window1.geometry("500x510")
window1.configure(background="lightgreen")

def exit():
window1.destroy()
rem = "The remedies for Bacterial Spot are:\n\n "
remedies = tk.Label(text=rem, background="lightgreen",
fg="Brown", font=("", 15))
remedies.grid(column=0, row=7, padx=10, pady=10)
rem1 = " Discard or destroy any affected plants. \n Do not compost them. \n
Rotate yoour tomato plants yearly to prevent re-infection next year. \n Use copper
fungicites"
remedies1 = tk.Label(text=rem1, background="lightgreen",
fg="Black", font=("", 12))
remedies1.grid(column=0, row=8, padx=10, pady=10)

button = tk.Button(text="Exit", command=exit)


button.grid(column=0, row=9, padx=20, pady=20)
window1.mainloop()

33
def vir():
window.destroy()
window1 = tk.Tk()
window1.title("Dr. Plant")
window1.geometry("650x510")
window1.configure(background="lightgreen")

def exit():
window1.destroy()
rem = "The remedies for Yellow leaf curl virus are: "
remedies = tk.Label(text=rem, background="lightgreen",
fg="Brown", font=("", 15))
remedies.grid(column=0, row=7, padx=10, pady=10)
rem1 = " Monitor the field, handpick diseased plants and bury them. \n Use
sticky yellow plastic traps. \n Spray insecticides such as organophosphates,
carbametes during the seedliing stage. \n Use copper fungicites"
remedies1 = tk.Label(text=rem1, background="lightgreen",
fg="Black", font=("", 12))
remedies1.grid(column=0, row=8, padx=10, pady=10)

button = tk.Button(text="Exit", command=exit)


button.grid(column=0, row=9, padx=20, pady=20)

window1.mainloop()

def latebl():
window.destroy()
34
window1 = tk.Tk()
window1.title("Dr. Plant")
window1.geometry("520x510")
window1.configure(background="lightgreen")

def exit():
window1.destroy()
rem = "The remedies for Late Blight are: "
remedies = tk.Label(text=rem, background="lightgreen",
fg="Brown", font=("", 15))
remedies.grid(column=0, row=7, padx=10, pady=10)

rem1 = " Monitor the field, remove and destroy infected leaves. \n Treat
organically with copper spray. \n Use chemical fungicides,the best of which for
tomatoes is chlorothalonil."
remedies1 = tk.Label(text=rem1, background="lightgreen",
fg="Black", font=("", 12))
remedies1.grid(column=0, row=8, padx=10, pady=10)

button = tk.Button(text="Exit", command=exit)


button.grid(column=0, row=9, padx=20, pady=20)
window1.mainloop()

def analysis():
import cv2
import numpy as np
import os
35
from random import shuffle
from tqdm import \
tqdm
verify_dir = 'testpicture'
IMG_SIZE = 50
LR = 1e-3
MODEL_NAME = 'healthyvsunhealthy-{}-{}.model'.format(LR, '2conv-basic')

def process_verify_data():
verifying_data = []
for img in tqdm(os.listdir(verify_dir)):
path = os.path.join(verify_dir, img)
img_num = img.split('.')[0]
img = cv2.imread(path, cv2.IMREAD_COLOR)
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
verifying_data.append([np.array(img), img_num])
np.save('verify_data.npy', verifying_data)
return verifying_data

verify_data = process_verify_data()

import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
import tensorflow as tf
from tensorflow.python.framework import ops
36
ops.reset_default_graph()

convnet = input_data(shape=[None, IMG_SIZE, IMG_SIZE, 3], name='input')

convnet = conv_2d(convnet, 32, 3, activation='relu')


convnet = max_pool_2d(convnet, 3)

convnet = conv_2d(convnet, 64, 3, activation='relu')


convnet = max_pool_2d(convnet, 3)

convnet = conv_2d(convnet, 128, 3, activation='relu')


convnet = max_pool_2d(convnet, 3)

convnet = conv_2d(convnet, 32, 3, activation='relu')


convnet = max_pool_2d(convnet, 3)

convnet = conv_2d(convnet, 64, 3, activation='relu')


convnet = max_pool_2d(convnet, 3)

convnet = fully_connected(convnet, 1024, activation='relu')


convnet = dropout(convnet, 0.8)

convnet = fully_connected(convnet, 4, activation='softmax')


convnet = regression(convnet, optimizer='adam', learning_rate=LR,
loss='categorical_crossentropy', name='targets')
model = tflearn.DNN(convnet, tensorboard_dir='log')
if os.path.exists('{}.meta'.format(MODEL_NAME)):
37
model.load(MODEL_NAME)
print('model loaded!')

import matplotlib.pyplot as plt

fig = plt.figure()
for num, data in enumerate(verify_data):
img_num = data[1]
img_data = data[0]
y = fig.add_subplot(3, 4, num + 1)
orig = img_data
data = img_data.reshape(IMG_SIZE, IMG_SIZE, 3)
model_out = model.predict([data])[0]
if np.argmax(model_out) == 0:
str_label = 'healthy'
elif np.argmax(model_out) == 1:
str_label = 'bacterial'
elif np.argmax(model_out) == 2:
str_label = 'viral'
elif np.argmax(model_out) == 3:
str_label = 'lateblight'

if str_label =='healthy':
status ="HEALTHY"
else:
status = "UNHEALTHY"

38
message = tk.Label(text='Status: '+status, background="lightgreen",
fg="Brown", font=("", 15))
message.grid(column=0, row=3, padx=10, pady=10)
if str_label == 'bacterial':
diseasename = "Bacterial Spot "
disease = tk.Label(text='Disease Name: ' + diseasename,
background="lightgreen",
fg="Black", font=("", 15))
disease.grid(column=0, row=4, padx=10, pady=10)
r = tk.Label(text='Click below for remedies...', background="lightgreen",
fg="Brown", font=("", 15))
r.grid(column=0, row=5, padx=10, pady=10)
button3 = tk.Button(text="Remedies", command=bact)
button3.grid(column=0, row=6, padx=10, pady=10)
elif str_label == 'viral':
diseasename = "Yellow leaf curl virus "
disease = tk.Label(text='Disease Name: ' + diseasename,
background="lightgreen",
fg="Black", font=("", 15))
disease.grid(column=0, row=4, padx=10, pady=10)
r = tk.Label(text='Click below for remedies...', background="lightgreen",
fg="Brown", font=("", 15))
r.grid(column=0, row=5, padx=10, pady=10)
button3 = tk.Button(text="Remedies", command=vir)
button3.grid(column=0, row=6, padx=10, pady=10)
elif str_label == 'lateblight':
diseasename = "Late Blight "
39
disease = tk.Label(text='Disease Name: ' + diseasename,
background="lightgreen",
fg="Black", font=("", 15))
disease.grid(column=0, row=4, padx=10, pady=10)
r = tk.Label(text='Click below for remedies...', background="lightgreen",
fg="Brown", font=("", 15))
r.grid(column=0, row=5, padx=10, pady=10)
button3 = tk.Button(text="Remedies", command=latebl)
button3.grid(column=0, row=6, padx=10, pady=10)
else:
r = tk.Label(text='Plant is healthy', background="lightgreen", fg="Black",
font=("", 15))
r.grid(column=0, row=4, padx=10, pady=10)
button = tk.Button(text="Exit", command=exit)
button.grid(column=0, row=9, padx=20, pady=20)

def openphoto():
dirPath = "testpicture"
fileList = os.listdir(dirPath)
for fileName in fileList:
os.remove(dirPath + "/" + fileName)
fileName = askopenfilename(initialdir='D:/MiniProject/train/train', title='Select
image for analysis ',
filetypes=[('image files', '.jpg')])
dst = "D:/MiniProject/testpicture"
shutil.copy(fileName, dst)
load = Image.open(fileName)
40
render = ImageTk.PhotoImage(load)
img = tk.Label(image=render, height="250", width="500")
img.image = render
img.place(x=0, y=0)
img.grid(column=0, row=1, padx=10, pady = 10)
title.destroy()
button1.destroy()
button2 = tk.Button(text="Analyse Image", command=analysis)
button2.grid(column=0, row=2, padx=10, pady = 10)
button1 = tk.Button(text="Get Photo", command = openphoto)
button1.grid(column=0, row=1, padx=10, pady = 10)
window.mainloop()

41
APPENDIX 2
CNN.PY:

42
43
44
UI.PY:

45
46
REFERENCES

Agarwal, M. et al. (2020) ‘ToLeD: Tomato Leaf Disease Detection


using Convolution Neural Network’, Procedia Computer Science, 167, pp. 293–
301. doi: 10.1016/j.procs.2020.03.225.
Agrios, G. N. (2005) ‘chapter ten - ENVIRONMENTAL FACTORS
THAT CAUSE PLANT DISEASES’, in Agrios, G. N. (ed.) Plant Pathology (Fifth
Edition).
San Diego: Academic Press, pp. 357–384. doi: 10.1016/B978-0-08-
047378-9.50016-6. Al-Sadi, A. (2017) ‘Impact of Plant Diseases on Human
Health’, International Journal of Nutrition, Pharmacology, Neurological Diseases,
7, pp. 21–22. doi: 10.4103/ijnpnd.ijnpnd_24_17. Ault, R. (2020) ‘Optimization
Study of an Image Classification Deep Neural Network’, Final Report, p. 10.
Bishop, P. and Herron, R. (2015) ‘Use and Misuse of the Likert Item
Responses and Other Ordinal Measures’, International Journal of Exercise Science,
8, p. Article 10.
Chakrabartty, S. N. (2014) ‘Scoring and Analysis of Likert Scale: Few
Approaches’, Jr. of Knowledge Management & Information Technology, 1.
Dieber, J. and Kirrane, S. (2020) ‘Why model why? Assessing the strengths and
limitations of LIME.’, CoRR. Food and Agriculture Organization of the United
Nations (ed.) (2017) The future of food and agriculture: trends and challenges.
Rome: Food and Agriculture Organization of the United Nations.
Garbin, C., Zhu, X. and Marques, O. (2020) ‘Dropout vs. batch
normalization: an empirical study of their impact to deep learning’, Multimedia
Tools and Applications, 79(19), pp. 12777–12815. doi: 10.1007/s11042-019-
08453-9.
Gavhale, M. and Gawande, U. (2014) ‘An Overview of the Research on
Plant Leaves Disease detection using Image Processing Techniques’, IOSR Journal
of Computer Engineering, 16, pp. 10–16. doi: 10.9790/0661-16151016.
Guo, G. et al. (2003) ‘KNN Model-Based Approach in Classification’,
in Meersman, R., Tari, Z., and Schmidt, D. C. (eds) On The Move to Meaningful
Internet Systems 2003: CoopIS, DOA, and ODBASE. Berlin, Heidelberg: Springer
(Lecture Notes in Computer Science), pp. 986–996. doi: 10.1007/978-3-540-
39964-3_62.
Hatuwal, B. K., Shakya, A. and Joshi, B. (2020) ‘Plant Leaf Disease
Recognition Using Random Forest, KNN, SVM and CNN’, POLIBITS, 62, p. 7.

47
48
49

You might also like