0% found this document useful (0 votes)
20 views69 pages

Report Batch-1

Uploaded by

prakash S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views69 pages

Report Batch-1

Uploaded by

prakash S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

AGRI DETECT: MACHINE LEARNING FOR TIMELY PLANT

DISEASE IDENTIFICATION USING LEAVES

A PROJECT REPORT

Submitted by

MAHARAJA K (113120UG07058)

MEENAKSHI G (113120UG07060)

PRAKASH S (113120UG07072)

TARUNIKA K (113120UG07100)

in partial fulfillment for the award of the degree


of
BACHELOR OF TECHNOLOGY

IN

INFORMATION TECHNOLOGY

VEL TECH MULTI TECH Dr. RANGARAJAN Dr. SAKUNTHALA


ENGINEERING COLLEGE,
ALAMATHI ROAD, AVADI, CHENNAI-62

ANNA UNIVERSITY :: CHENNAI 600 025


APRIL 2024
BONAFIDE CERTIFICATE

Certified that this project report of title “AGRI DETECT: MACHINE


LEARNING FOR TIMELY PLANT DISEASE IDENTIFICATION USING
LEAVES” is the bonafide work of MAHARAJA K (113120UG07058),
MEENAKSHI G (113120UG07060), PRAKASH S (113120UG07072) and
TARUNIKA K (113120UG07100) who carried out the project work under my
supervision. Certificate further, that to the best of my knowledge the work
reported here is does not form any other project report or dissertation on the basis
of which a degree or award was conferred on an earlier occasion on this or any
other candidate.

SIGNATURE SIGNATURE

HEAD OF THE DEPARTMENT INTERNAL GUIDE

Mr. R. PRABU, M. Tech., Dr. V. SURESH KUMAR, Ph.D


HEAD OF THE DEPARTMENT PROFESSOR
Department of Information Department of Information
Technology Technology
Vel Tech Multi Tech Dr. Rangarajan Vel Tech Multi Tech Dr. Rangarajan
Dr. Sakunthala Engineering College, Dr. Sakunthala Engineering College,
Avadi, Chennai-600 062 Avadi, Chennai-600 062

i
CERTIFICATE FOR EVALUATION

College Name : VEL TECH MULTI TECH DR. RANGARAJAN


DR. SAKUNTHALA ENGINEERING
COLLEGE
Branch : INFORMATION TECHNOLOGY
Semester : VIII

NAME OF THE
S.NO STUDENTS WHO HAS TITLE OF THE NAME OF THE
DONE THE PROJECT PROJECT SUPERVISOR
1. MAHARAJA K
[113120UG07058] AGRI DETECT: Dr. V. SURESH
MACHINE KUMAR, Ph.D
2. MEENAKSHI G LEARNING FOR
[113120UG07060] TIMELY PLANT PROFESSOR,
DISEASE Department of
PRAKASH S IDENTIFICATION Information
3.
USING LEAVES Technology
[113120UG07072]

4. TARUNIKA K
[113120UG07100]

The report of this project work submitted by the above students in partial fulfilment
for the award of Degree of Bachelor of Technology in Information Technology of
Anna University was evaluated and confirmed to be the report of work done by the
above student. This project report was submitted for the viva-voice held on
at Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala
Engineering College.

INTERNAL EXAMINER EXTERNAL EXAMINER

ii
ACKNOWLEDGEMENT

We wish to express our sincere thanks to almighty and the people who
extended their help during the course of our work. We are greatly and profoundly
thankful to our honorable Chairman, Col. Prof. Vel. Shri Dr.R.Rangarajan
B.E.(ELEC), B.E.(MECH), M.S.(AUTO)., D.Sc., & Vice Chairman,
Dr.Sakunthala Rangajaran M.B.B.S., for facilitating us with this opportunity.
We take this opportunity to extend our gratefulness to our respectable Chairperson
& Managing Trustee Smt. Rangarajan Mahalakshmi Kishore B.E., M.B.A.,
for her continuous encouragement.

Our special thanks to our cherish able Vice- President Mr. K.V.D. Kishore
Kumar B.E., M.B.A., for his attention towards students’ community. We also
record our sincere thanks to our honorable Principal, Dr. V. Rajamani M.E.,
Ph.D for his kind support to take up this project and complete it successfully.

We would like to express our special thanks to our Head of the Department,
Mr. R. Prabu, M.Tech., Department of Information Technology and our project
internal guide Dr. V. Suresh Kumar, Ph.D and our project coordinator Dr. M.
Rajesh Khanna, Ph.D for their moral support by taking keen interest on our
project work and guided us all along, till the completion of our project Work and
also by providing with all the necessary information required for developing a
good system with successful completion of the same.

Further, the acknowledgement would be incomplete if we would not


mention a wordof thanks to our most beloved parents for their continuous support
and encouragement all the way through the course that has led us to pursue the
degree and confidently complete the project work.

iii
ABSTRACT
Global agriculture is seriously threatened by plant diseases, which influence both
food security and economic stability. Quick response is made possible by real-
time detection to prevent the spread of illness and protect agricultural production.
Achieving high accuracy and low latency detection is very important and it might
be a challenging process, particularly for computationally intensive models. Since
plant leaf image datasets can vary in size, existing research ideas include analysis
carried out using one algorithm only, which may not be suitable for all datasets.
This work develops an automated method for the early detection of diseases
harming plant leaves, providing an innovative approach to this issue. In it, four
distinct machine learning algorithms – Support Vector Machine (SVM), K-
Nearest Neighbor (KNN), Convolutional Neural Networks (CNN), Decision Tree
are used. As MATLAB is a great tool for numerical computing, we made use of
it to provide results with the highest degree of precision possible. This system
allows for the execution of many analyses, with the best techniques being used in
accordance with the needs and the kind of datasets. Numerous findings derived
from four distinct types of algorithms are included in the suggested system. The
Proposed system shows the analysis of different plant leaves diseases, which
predicts the percentage of diseased leaves. As a result, we can select which
algorithm is the best suited one to identify the diseases in plant leaves at an early
stage. By analyzing crop images, it is possible to identify even the smallest signs
of disease, allowing for timely intervention to halt the disease's course. If we
lessen these interventions, sustainable agriculture may develop in harmony with
the environment.

iv
LIST OF FIGURES

FIGURE NO NAME OF THE FIGURE PAGE NO

1 PLDD Progression Plan 18


2 Use Case Diagram 19
3 Class Diagram 19
4 Sequence Diagram 20
5 Activity Diagram 21
6 Different Types of Leaves Images 24
7 Image Segmentation 26
8 Feature Extraction 27
9 Comparison - SVM 31
10 Epochs - CNN 32
11 Comparison - KNN 33
12 Comparison – Decision Tree 34
13 Comparison of Apple Leaves 35
14 Comparison of Grapes Leaves 36
15 Comparison of Mango Leaves 38
16 Comparison of Potato Leaves 39
17 Comparison of Tomato Leaves 40
18 MATLAB App for Apple Leaves 41
Disease Prediction
19 MATLAB App for Grapes Leaves 42
Disease Prediction

v
LIST OF ABBREVIATIONS

ML Machine Learning
SVM Support Vector Machine
CNN Convolutional Neural Networks
KNN K-Nearest Neighbor
PLDD Plant Leaf Disease Detection
RGB Red Green Blue
SGDM Stochastic Gradient Descent with Momentum
GMM Gaussian Mixture Model
HOG Histogram of Oriented Gradients
LBP Local Binary Pattern

vi
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO


ACKNOWLEDGEMENT iii

ABSTRACT iv
LIST OF FIGURES v
LIST OF ABBREVIATIONS vi

1 INTRODUCTION 1
1.1 DEFINITION 1
1.2 OBJECTIVE 1

1.3 SCOPE OF THE PROJECT 2

2 LITERATURE SURVEY 3
2.1 PAPER 1 3
2.1.1 ADVANTAGES 3
2.1.2 DISADVANTAGES 4
2.2 PAPER 2 4
2.2.1 ADVANTAGES 4
2.2.2 DISADVANTAGES 5
2.3 PAPER 3 5
2.3.1 ADVANTAGES 5
2.3.2 DISADVANTAGES 6
2.4 PAPER 4 6
2.4.1 ADVANTAGES 7
2.4.2 DISADVANTAGES 7
2.5 PAPER 5 7
2.5.1 ADVANTAGES 8
2.5.2 DISADVANTAGES 8
2.6 PAPER 6 9
2.6.1 ADVANTAGES 9
2.6.2 DISADVANTAGES 9
2.7 PAPER 7 10
2.6.1 ADVANTAGES 10
2.6.2 DISADVANTAGES 10
2.8 PAPER 8 11
2.8.1 ADVANTAGES 11
2.8.2 DISADVANTAGES 11
2.9 PAPER 9 12
2.9.1 ADVANTAGES 12
2.9.2 DISADVANTAGES 13
2.10 PAPER 10 13
2.10.1 ADVANTAGES 13
2.10.2 DISADVANTAGES 14

3 SYSTEM DESIGN 15
3.1 SYSTEM REQUIREMENTS 15
3.1.1 HARDWARE CONFIGURATIONS 15
3.1.2 SOFTWARE CONFIGURATIONS 15
3.2 EXISTING SYSTEM 15
3.2.1 DISADVANTAGE OF EXISTING SYSTEM 16
3.3 PROPOSED SYSTEM 16
3.3.1 ADVANTAGE OF PROPOSED SYSTEM 17
3.4 PLDD PROGRESSION PLAN 18
3.5 UML DIAGRAMS 19
3.5.1 USE CASE DIAGRAM 19
3.5.2 CLASS DIAGRAM 19
3.5.3 SEQUENCE DIAGRAM 20
3.5.4 ACTIVITY DIAGRAM 21

4 MODULES DESCRIPTION 23
4.1 OVERVIEW OF THE PROJECT 23
4.2 MODULES 23
4.2.1 DATA COLLECTION 23
4.2.2 IMAGE PREPROCESSING 24
4.2.3 IMAGE SEGMENTATION 26
4.2.4 FEATURE EXTRACTION 27
4.2.5 MODEL TRAINING 27
4.2.6 MODEL EVALUATION 28
4.2.7 MODEL FINE-TUNING 29
4.3 ALOGRITHMS 30
4.3.1 SUPPORT VECTOR MACHINE 30
4.3.2 CONVOLUTIONAL NEURAL NETWORK 32
4.3.3 K-NEAREST NEIGHBOR 33
4.3.4 DECISION TREE 34
4.4 ANALYSIS OF DATASETS 35
4.4.1 APPLE DATASET 35
4.4.2 GRAPES DATASET 36
4.4.3 MANGO DATASET 37
4.4.4 POTATO DATASET 38
4.4.5 TOMATO DATASET 39

5 TESTING 41
5.1 TESTING OF THE APPLICATION 41
5

5 6 CONCLUSION AND FUTURE ENHANCEMENTS 43


6.1 CONCLUSION 43
6.2 FUTURE ENHANCEMENTS 44

APPENDICES
APPENDIX 1
APPENDIX 2

REFERENCES
CHAPTER 1
INTRODUCTION

1.1 DEFINITION
Agriculture is the backbone of human civilization, giving people all over
the world a means of subsistence and a living. However, plant diseases, which are
becoming more common and endangering the security of the world's food supply,
pose a serious threat. Concerns about providing for the nutritional requirements
of a growing population have been accentuated by the significant agricultural
losses brought on by the outbreak of these diseases. Among the most essential
goals of research on diagnosing diseases is to achieve high accuracy and low
latency; nevertheless, this is not an easy task, particularly when dealing with
computationally demanding models. We set out on an endeavor to investigate the
revolutionary potential of Machine Learning (ML) algorithms in transforming our
understanding, diagnosis, and management of plant diseases in the modern
environment of data-driven decision-making, where computational power and
creativity converge. In this regard, the use of ML algorithms presents a viable path
for disease detection and management in terms of accuracy, speed, and scalability.
We want to rethink the paradigms of disease identification, mitigation, and
prevention by utilizing the enormous libraries of agricultural data and
sophisticated algorithms. The potential for early diagnosis, prompt response, and
tailored interventions by automated systems with machine learning skills is
significant. This might lessen the detrimental impact of diseases on crop yield and
preserve the resilience of agricultural ecosystems.

1.2 OBJECTIVE
The primary objective is to develop an unique approach for automating the
early identification of plant leaf diseases. The project will use the synergistic
properties of four different algorithms namely Decision Tree, K-Nearest Neighbor

1
(KNN), Support Vector Machine (SVM), and Convolutional Neural Networks
(CNN) to achieve this goal. Through the utilization of several algorithms, the
research seeks to maximize disease detection accuracy levels. This holistic
strategy demonstrates an intricate understanding of the various machine
learning techniques that are accessible.

1.3 SCOPE OF THE PROJECT


Unlike the current trend, which uses only one algorithm, our methodology
stands out by using four different algorithms. This diverse approach makes a
thorough exploration possible. Our process puts an emphasis on experimentation
and validation, using real-time images taken directly from the field as well as
image datasets available on the internet. By including field data, our conclusions
become more genuine and applicable, which makes it easier to evaluate the
system's performance in real-world situations. By means of methodical
comparison and scrutiny, we aim to determine the best algorithmic framework
suited to certain plant species and disease profiles.

2
CHAPTER 2
LITERATURE SURVEY
2.1 Plant Leaf Disease Detection Using Image Processing
Rahul Kundu, Usha Chauhan, S.P.S Chauhan, 2022

Global food security and agricultural output are seriously threatened by


plant diseases. The prompt and precise identification of such diseases is essential
for efficient disease control and prevention. In this work, we describe a unique
method that uses image processing techniques to identify disease of plant leaves.
Our method leverages advanced algorithms for image segmentation and feature
extraction to identify regions of interest indicative of disease presence. Specifically,
we explore the application of Alex-Net algorithms and Convolutional Neural
Networks (CNNs) to examine leaf images and classify them based on disease
symptoms. Through extensive experimentation and evaluation, we exhibit our
method's reliability in quickly recognizing an assortment of plant diseases. The
proposed methodology offers a promising avenue for early disease diagnosis and
proactive intervention, thereby enhancing agricultural production and
sustainability. Our findings contribute to the ongoing efforts aimed at leveraging
technology to address critical challenges in agriculture and food production.

2.1.1 ADVANTAGES
 Offers a cutting-edge technique for identifying plant diseases using
image processing, advancing the field.
 Utilizes sophisticated algorithms like Alex-Net and CNNs, enhancing
accuracy.
 Offers practical benefits for agriculture, enabling early disease
detection and intervention.

3
2.1.2 DISADVANTAGES
 Lacks comprehensive validation, potentially limiting generalizability.
 Relies heavily on the availability and quality of training data.

2.2 Plant Disease Detection Techniques: A Review


Kurleen Kaur Sandhu, Rajbir Kaur, 2019

Significant losses are incurred by plant diseases in terms of agricultural


product output, economics, quality, and quantity. Plant disease losses must be
managed because the agricultural output of India accounts for 70% of the
country's GDP. To prevent these diseases, plants must be watched after from the
very beginning of their life cycle. Typically, this monitoring has been done by
eye-to-eye inspection, which is more costly, time-consuming, and requires a high
level of competence. Therefore, the disease's detection system has to be digitized
in order to expedite this procedure. Image processing algorithms must be
developed for the disease detection system. Several researchers have created
systems based on distinct image processing methodologies. This study examines
the possibility of plant leaf disease detection systems to promote agricultural
development.

2.2.1 ADVANTAGES

 Offers an extensive review of plant disease detection techniques,


enhancing understanding.
 Helps researchers grasp strengths and limitations of existing methods,
aiding decision-making.
 Pinpoints areas needing further investigation, guiding future studies.

4
2.2.2 DISADVANTAGES
 Lacks original findings and experimental validation and it may lack
original research findings.

 Depending on the selection criteria and methodology used in the


review process, there is a possibility of bias or subjectivity in the
evaluation and interpretation of the literature, which could affect the
reliability of the conclusions drawn.

2.3 Leaf Disease Detection and Classification based on Machine


Learning
Sandeep Kumar, KMVV Prasad, A. Srilekha, T. Suman, B.
Pranav Rao, J. Naga Vamshi Krishna, 2020

Among the most vital duties in agriculture is the detection of plant diseases.
This is something that has a major effect on the economy. Given how prevalent
plant diseases are, finding infestations in plants is an important part of working in
the agriculture sector. It is crucial to constantly be watching the plants in order to
identify diseases in their leaves. This ongoing inspection and monitoring of the
plants requires a lot of human labor as well as time. To put it simply, in order to
examine the plants, some kind of controlled approach is needed. Plant diseases
may be more easily identified using initiatives, which can save time and effort
when identifying damaged leaves. In contrast with existing methods, the
suggested algorithm can more reliably determine and classify infected plants.

2.3.1 ADVANTAGES
 Introduces novel leaf disease detection using machine learning,
potentially improving efficiency.
 Automates disease detection, reducing manual effort and time.
 Machine learning adapts to different species and diseases, enhancing
versatility.

5
2.3.2 DISADVANTAGES

 The authenticity and quality of the data used to train have an important
effect on how well the model used for machine learning operates.

 Implementing machine learning algorithms and deploying the system


may require specialized technical expertise, potentially posing
challenges for users with limited computational or programming skills.

2.4 Plant Disease Detection and Diagnosis using Deep Learning


R. Senthil Kumar, Amarjeeth Singh, Hema Jaisree S V,
Aishwarya D, J S Jayasree, 2022

Agriculture is essential to the development of a nation. The foundational


nourishment for human health is food. Nonetheless, we found that diseases have
a significant impact on plants, which are the source of sustenance. The concerning
problem that has to be addressed is plant diseases. Even in this day and age of
fully developed technology, some farmers are still unable to obtain professional
counsel because of budgetary restraints and the extra difficulty of traveling great
distances for consultations. For farmers, identifying plant diseases takes a lot of
effort and necessitates periodic field visits by professionals to assess the situation.
Farmers will be able to diagnose conditions more successfully and lessen the
monetary consequences associated with agriculture if diseases of plants are
accurately and promptly detected. Compared to other approaches like Machine
Learning algorithms, which can address well-structured difficulties, Deep
Learning models are more beneficial since they do image processing and feature
extraction without the need for human participation. Convolutional neural
networks (CNNs) are used in our methods to identify and diagnose plant diseases.
Nevertheless, deeper architectures are needed for more complicated data
applications.

6
2.4.1 ADVANTAGES
• Offers an unconventional method for diagnosing and monitoring plant
diseases using deep learning techniques, potentially offering superior
accuracy and efficiency compared to conventional methods.
• Deep learning models may reduce the need for manual inspection and
intervention by automating the detection and diagnostic process, thus
saving time and labor costs.
• Major datasets may be utilized for developing deep learning models
allowing for scalability across different plant species potentially
improving generalization capabilities.

2.4.2 DISADVANTAGES

• The quantity and grade of labeled training data impact the model's
efficacy, which may be limited or biased, affecting the model’s
performance and generalizability.

• Deep learning model execution and training frequently ask for


substantial computing power, including high-performance hardware
and memory, which may be a barrier for researchers or practitioners
with limited access to such resources.

2.5 Plant Disease Detection Using Machine Learning


Ebrahim Hirani, Varun Magotra, Jainam Jain, Pramod Bide,
2021

Convolutional neural networks have been investigated recently for a variety


of applications, including image segmentation, feature extraction, and
classification. Since plant disease is one of the main causes of low production in
the agriculture industry, one of those applications is the detection of plant

7
diseases. Plant disease detection and classification has been addressed using a
variety of deep learning techniques over time. Transformer networks have
recently shown a lot of potential in computer vision problems. To detect plant
diseases, this study contrasts these methods with conventional CNN methods.

2.5.1 ADVANTAGES

• It offers an innovative approach to plant disease identification that


makes use of machine learning techniques, which might result in a
diagnosis that is more accurate and efficient than a method that uses
more conventional approaches.

• Deep learning models enable automation of the detection process,


reducing the need for manual labor and enabling faster and more
consistent identification of plant diseases.

• The system's scalability and adaptability may be improved by adapting


to different plant species and disease kinds using algorithms based on
deep learning that can be showed on huge amounts of data.

2.5.2 DISADVANTAGES

• The quality and the diversity of the initial training data significantly
influence the performance of deep learning models. Improper or
unbalanced datasets might lead to inaccurate conclusions and
inadequate findings.

• Deep learning models require knowledge in to be implemented and


adjusted in machine learning and computational resources, which may
pose challenges for researchers or practitioners with limited technical
knowledge or access to resources.

8
2.6 Plant Disease Detection Using Machine Learning Techniques
D. Varshney, B. Babukhanwala, J. Khan, D. Saxena, 2022

Plant disease identification is one of the topics covered in the article where
machine learning methodologies are applied. The inefficiencies of traditional
methods are emphasized, along with the potential for enhanced accuracy and
efficiency offered by machine learning algorithms. In agriculture, early disease
identification is crucial, and machine learning can help with this problem. It may
also briefly mention the specific techniques and datasets used in the study, as well
as the expected outcomes or contributions to the field.

2.6.1 ADVANTAGES

• Contributes to the development of agricultural technology by


introducing the use of machine learning approaches for plant disease
detection.

• By increasing the accuracy of disease detection, machine learning


algorithms might eventually lead to more effective approaches to
disease control when compared to conventional methods.

2.6.2 DISADVANTAGES

• The authenticity and quality of the training data have an enormous


effect on how well predictive models work.

• Implementing and fine-tuning machine learning algorithms require


specialized technical expertise and computational resources,
potentially limiting accessibility to researchers or practitioners with
limited technical backgrounds or resources.

9
2.7 Plant Disease Detection Using CNN
Garima Shrestha, Deepsikha, Majolica Das, Naiwrita Dey, 2022

Productivity in agriculture is a major factor in the India's thrift.


Consequently, agricultural products and commodities have a critical role for the
ecosystem and for human welfare. Numerous diseases claim the lives of crops
every year. Many plants die as a result of improper diagnosis of these illnesses,
ignorance of the manifestations, and lack of knowledge about remedies. Here, a
CNN-based technique for identifying plant diseases is put forward. On sample
shots, modelling research and survey are conducted regarding the area of the
afflicted zone and the time intricacy. Strategies for image processing are used to
do it.

2.7.1 ADVANTAGES
• Introduces the use of Convolutional Neural Networks (CNN) for plant
disease detection, indicating a sophisticated and state-of-the-art approach
to solving the problem.

• CNNs have demonstrated high accuracy in image classification tasks,


suggesting that the proposed methodology may lead to accurate and
reliable detection of plant diseases.

2.7.2 DISADVANTAGES

• CNN performance is highly dependent on the level and class of


annotated training data.

• Implementing and training CNN models require significant


computational resources, including high-performance hardware and
memory, which is a barrier for researchers or practitioners with limited
access to such resources.

10
2.8 An Efficient Algorithm for Plant Disease Detection Using Deep
Convolutional Networks
Pratibha Nayar, Shivank Chhibber, Ashwani Kumar Dubey,
2022

A major aspect in affecting crop quality and production is the presence of


pests and plant diseases. Plant diseases may be extremely harmful to farmers
whose livelihoods depend on producing healthy crops, in addition to creating a
danger to global food security. Practical agricultural productivity relies
substantially on the identification of plant diseases. It guarantees the regular
operation and productive harvest of agricultural plantings while managing the
plant's development and health. Climate is only one of the many elements that
influence the disease that affects plants. This article examines a different method
of leveraging deep convolutional networks to facilitate the development of a
disease detection model in leaf labeling. The market for applications of computer
vision in precision agriculture is expected to grow as computer vision technology
advances, offering opportunities to improve and widen the practice.

2.8.1 ADVANTAGES

• The study suggests the effectiveness of deep convolutional network


method for plant disease identification, suggesting potential
improvements in computational efficiency compared to existing
methods.

• Utilizing deep convolutional networks signifies a modern and powerful


approach to image analysis, suggesting that the proposed algorithm.

2.8.2 DISADVANTAGES

• Implementing and training deep convolutional networks typically


requires significant computational resources.

11
• The affordability and calibre of labelled training data are critical
components that determine the effectiveness of deep learning systems.

2.9 Plant Disease Detection and Classification by Deep Learning: A


Review
Lili Li, Shujuan Zhang, Bin Wang, 2021

One area of artificial intelligence is deep learning. Because of the benefits


of autonomous intelligence and feature extraction, professional and commercial
circles have become increasingly concerned with it in the last few decades. Deep
learning applications in plant disease detection can mitigate the drawbacks of
deliberately selected disease spot characteristics, increase the objectivity of plant
disease feature extraction, and accelerate technological advancements and
research productivity. The study advancement of deep learning technology in the
field of agricultural leaf disease identification during the past several years is
reviewed in this review. In this work, we describe the state of the art and the
difficulties in detecting plant leaf disease with deep learning and cutting-edge
imaging methods. We anticipate that our work will be a useful tool for scientists
investigating the identification of insect pests and plant diseases. In addition, we
talked about some of the challenges and barriers that still need to be fixed.

2.9.1 ADVANTAGES

• The review may assist identify gaps in current research and propose
prospective topics for future investigation.

• It provides insights into the application of deep learning algorithms for


plant disease detection and classification.

12
2.9.2 DISADVANTAGES

• Since it is a review paper, it may not present original research findings


or experimental validation of the discussed methodologies.

• Depending on the scope and focus of the review, it may overlook certain
emerging trends or methodologies in plant disease detection.

2.10 LeafLife: Deep Learning Based Plant Disease Detection


Application
M. S. Soyer, C. Yılmaz, I. M. Ozcan, F. Cogen and T. C. Yıldız,
2021

In this study, illnesses in vegetation on earth have been spotted and


diagnosed by the combined use of image processing and Convolutional Neural
Networks (CNN). Using a data collection containing pictures of well-being and
unhealthy plants' leaves, a CNN was generated. The model was trained using a
data set of different leaf photos with an efficient estimated precision.
Additionally, an app has been created to identify diseased plants using the same
data set. It is possible to ascertain in advance whether a leaf is really weakened
along with what sort of disease it has by using this programme to take an image
of the suspected diseased leaf. This makes it feasible to stop the disease's spread
as swiftly as feasible.

2.10.1 ADVANTAGES

• Provides an effective way to detect plant diseases, which may enable


farmers and researchers to properly track crop health.

• Demonstrates the applicability of deep learning techniques in real-


world scenarios, highlighting their effectiveness and potential for
widespread adoption.

13
2.10.2 DISADVANTAGES

• The paper may lack extensive validation and evaluation of the


"LeafLife" application in diverse agricultural settings, potentially
limiting its reliability and generalizability.

• Implementing deep learning may require significant technical


expertise and computational resources, which could pose challenges
for adoption and scalability, especially in resource-constrained
environments.

14
CHAPTER 3
SYSTEM DESIGN

3.1. SYSTEM REQUIREMENTS


3.1.1. Hardware Configurations

 PROCESSOR - 11TH Gen Intel® Core™ i7-11700

 SPEED - 2.50 GHz

 RAM - 16.0 GB DDR 4 RAM

3.1.2. Software Configurations

 OPERATING SYSTEM - Windows 8/9/10/11

 IDE - MATLAB 2024b

 LANGUAGE - Python 3.12

3.2 EXISTING SYSTEM


The existing systems have a strong reliance on pre-processed datasets,
which severely restricts their capacity to handle a wide range of plant leaf pictures.
If these systems just use one machine learning technique, they might not be able
to handle the subtle differences between different kinds of plant leaves. Moreover,
the records are often derived from detached leaves that are brought in for
processing, as opposed to being taken straight from living plants. As such, these
system application to real-time plant leaf analysis is hampered by their inability
to deliver prompt insights in the face of real-time data streams. Diversifying the
machine learning techniques used is necessary to surmount these obstacles and
enable a more thorough examination of plant leaf images. In order to provide real-
time analysis without the need for pre-collected datasets, efforts should also be

15
focused on establishing the capacity to analyze data directly from living plants.
These systems may more effectively handle the dynamic nature of plant leaf
analysis in real-world circumstances by broadening the scope of machine learning
approaches and including live data processing functions. This method improves
plant leaf analysis's precision and effectiveness while creating new opportunities
for use in the domains of ecology, agriculture, and environmental monitoring.

3.2.1. DISADVANTAGES

 Limited Adaptability: The system's capacity to adjust to new or


emerging plant species and changes in leaf attributes is limited by its
reliance on pre-processed information.

 Single Algorithm Approach: Relying only on one machine learning


technique might result in analysis findings that are less accurate since it
could miss significant details and traits that are present in many types
of plant leaves.

 Non-Real-Time Analysis: When presented with live data streams from


plants in their natural habitat, the system is unable to offer real-time
insights based on its analysis of detached plant leaves that have been
brought in for inspection.

3.3 PROPOSED SYSTEM


An innovative approach to automated plant disease diagnosis is the
suggested system, which uses four important machine learning algorithms:
Support Vector Machine (SVM), Convolutional Neural Networks (CNN),
Decision Tree, and K-Nearest Neighbor (KNN). By utilizing the strength of these
algorithms, the system guarantees dependable results and effective processing,
which makes it easier to identify crop diseases early and accurately from picture
input to prediction. This preventive approach minimizes crop losses and lowers

16
the need for chemical treatments, supporting sustainable agriculture and
environmental preservation. Furthermore, the system provides farmers with
significant choices resources that enable them to use environmentally and
economically beneficial practices. Therefore, the suggested approach not only
transforms agriculture's approach to managing disease but also establishes the
groundwork for a more robust and sustainable food production system that will
guarantee food security for coming generations.

3.3.1 ADVANTAGES

 Enhanced Decision-Making: By providing farmers with data-driven


insights, the system enhances their ability to make well-informed
decisions and allocates resources for disease control in an efficient
manner.

 Sustainable Agriculture: The system encourages sustainable


agriculture practices that protect soil health and ecosystem balance by
emphasizing early detection and reducing the use of chemicals.

 Cost-Efficiency: The method provides economical solutions for


disease control in agriculture by limiting crop losses and lowering the
requirement for costly pesticides.

 Improved Accuracy: By combining many machine learning


methods, disease diagnosis accuracy is increased while reducing false
positives and negatives.

17
3.4 PLDD PROGRESSION PLAN

Figure 1: PLDD Progression Plan

 The proposed system assists in early detection by automating the


diagnosis of plant leaf diseases and helps in sustainable agriculture.
 The System Architecture explains the flow of data and decisions, from
image input to disease prediction as follows:
 Data Collection - Large dataset of a plant type is gathered and
segregated as healthy and diseased images.
 Image Preprocessing - Images are processed and resized for
defining the architecture.
 Feature Extraction - Characteristics such as colour, texture
and shape of the leaves are analysed from the images.
 Model Training - The Model are trained using the ML
Algorithms by dividing the dataset into training (80%) and
testing (20%).
 Model Fine-Tuning - After the analysis on training image sets,
the process is performed on testing images.
 Accuracy - The process returns the disease percentage of the
testing images.

 Therefore, by utilizing this flow, the system effectively completes its


workflow, processes the incoming data, and produces the desired
results.

18
3.5 UML DIAGRAMS
3.5.1 USE CASE DIAGRAM

Figure 2: Use Case Diagram

3.5.2 CLASS DIAGRAM

Figure 3: Class Diagram

19
 Classes like imageDatastore, imread, imhist and predict are directly
involved in data processing, image reading, and model
training/prediction.
 Classes Data Collection, Image Preprocessing, Image Segmentation,
Feature Extraction, Model Training, Model Evaluation, and Model
Fine Tuning represent the sequential steps in the process flow.
 Arrows indicate the flow of data or process from one step to another
or the use of specific functions/classes within each step.

3.5.3 SEQUENCE DIAGRAM

Figure 4: Sequence Diagram

20
 The sequence diagram starts with the "User" participant and depicts
interactions with other components or modules in the system.
 Each participant represents a module or a logical unit of functionality.
 Arrows indicate the flow of control or communication between
participants.
 Loops represent iterative processes within the system.
 The sequence diagram provides a clear overview of the sequence of
actions and interactions between components.

3.5.4 ACTIVITY DIAGRAM

Figure 5: Activity Diagram

21
 The activity diagram starts with the "start" node and ends with the
"stop" node.
 Activities such as loading leaf images, creating labels, combining data,
initializing arrays, and processing images are represented as individual
actions in the diagram.
 The "while" loop represents the iterative process of computing color
histograms for each image in the "allImages" array.
 Each activity is represented by a rectangular box, and the arrows
indicate the flow of activities.
 The activity diagram provides a high-level overview of the activities
performed and their sequence.

22
CHAPTER 4
MODULES DESCRIPTION
4.1 OVERVIEW OF THE PROJECT
The Support Vector Machine (SVM), Convolutional Neural Networks
(CNN), K- Nearest Neighbor (KNN), and Decision Tree algorithms are utilized
to overcome the difficulty of inconsistent plant leaf image databases. Leveraging
the precision of MATLAB for numerical computing, the study highlights the
effectiveness of each algorithm. Remarkably, SVM and achieved 98% accuracy,
with KNN and Decision Tree close behind at 96%. This algorithmic diversity
ensures a nuanced and adaptable system for early disease identification in plant
leaves, offering a promising avenue for sustainable agriculture and timely
intervention.

4.2 MODULES
 Data Collection
 Image Pre-processing
 Image segmentation
 Feature extraction
 Model Training
 Model Evaluation
 Model Fine-Tuning

4.2.1 DATA COLLECTION

The In this study, we employed a publicly available dataset for the


identification of plant diseases in leaves. The dataset comprises 41676 images,
which have been categorised into 5 plant types and further subdivided into

23
healthy and diseased subfolders. The diseased plants are further categorised into
specific diseased names, as illustrated.

Figure 6: Different Types of Leaves Images

This comprehensive representation facilitates nuanced analysis and robust model


training, ensuring that the resulting insights and solutions are well-equipped to
address real-world scenarios encountered in guava farming. Furthermore, the
meticulous curation of this dataset underscores a commitment to thoroughness
and precision in agricultural research, laying a solid foundation for accurate and
impactful findings in disease detection, prevention, and management strategies
within guava cultivation practices.

4.2.2 IMAGE PREPROCESSING


Image preprocessing plays a critical role in the preparation of datasets for
analysis, particularly in fields like agriculture where precise identification and
diagnosis of plant diseases are paramount. In the context of our dataset
comprising, extensive preprocessing measures were implemented to ensure the
data’s quality and consistency, thereby enhancing the efficacy of subsequent
analysis and modeling efforts.

24
At the outset, all images underwent resizing to a standard resolution of
256x256 pixels. This resizing step serves multiple purposes. Initially, it promotes
consistency and uniformity by standardizing the size of every image in the
collection. Standardization is crucial for ensuring that the input data fed into
machine learning algorithms are of consistent format and size, thereby minimizing
variations that could potentially affect model performance. Additionally, resizing
mitigates computational challenges associated with working with images of
varying sizes, streamlining the analysis process and improving computational
efficiency. Following resizing, normalization procedures were applied to the
dataset. Normalization involves adjusting pixel values to a standardized scale,
typically ranging from 0 to 1 or -1 to 1. This step is essential for ensuring that
pixel values across different images are comparable and consistent. Normalization
also helps in mitigating the effects of variations in lighting conditions, exposure
settings, and camera characteristics that may be present in the raw images. By
standardizing pixel values, normalization enhances the interpretability and
generalizability of the dataset, making it more robust to variations encountered in
real-world scenarios.

By performing these preprocessing steps, the quality and consistency of


the dataset were significantly enhanced. The resized images ensure uniformity in
size, facilitating seamless integration into machine learning pipelines.
Normalization standardizes pixel values, making the dataset more amenable to
quantitative analysis and model training. Noise reduction techniques enhance the
clarity and interpretability of the images, enabling more accurate disease detection
and diagnosis.

By applying resizing, normalization, and noise reduction techniques to our


dataset images, we have ensured that the data is well-prepared for subsequent
analysis and modeling tasks. These preprocessing steps lay a solid foundation for
the development of robust machine learning models and the generation of

25
actionable insights to support sustainable agriculture practices and enhance crop
productivity.

4.2.3 IMAGE SEGMENTATION


When analyzing images in dataset, image segmentation is essential since
it separates the different regions in the photographs. This process involves
dividing the plotted portions of the images using morphological and thresholding
procedures. By segmenting the images, we can focus on specific regions of
interest, such as diseased areas or leaf structures, facilitating more targeted
analysis and feature extraction. Moreover, image segmentation helps enhance the
accuracy of subsequent tasks, such as disease detection and classification, by
isolating relevant regions for further processing.

Figure 7: Image Segmentation

By segmenting the images in the dataset, we create a foundation for more


detailed analysis and feature extraction, ultimately contributing to a better
understanding of guava plant leaf diseases and more accurate diagnostic
models.

26
4.2.4 FEATURE EXTRACTION
In order to differentiate between healthy and unhealthy leaves, plant leaf
images must be analyzed using a process called feature extraction, which entails
identifying important traits. Our method extracts characteristics from the leaves
according to their color, shape, and texture. Color histograms are utilized to
capture the distribution of pixel intensities, providing insights into the overall
color composition of the leaves. Shape characteristics, such as leaf morphology
and size, offer valuable information about structural differences between healthy
and diseased leaves.

Figure 8: Feature Extraction


Texture features, including patterns and surface properties, further
complement the analysis by capturing subtle variations in leaf texture associated
with different diseases. By extracting a diverse set of features, we obtain a
comprehensive representation of guava plant leaf diseases, enabling more
accurate and robust disease identification and classification.

4.2.5 MODEL TRAINING


In the model training phase, we adopt an approach to machine learning,
leveraging strategies from Decision Tree, K-Nearest Neighbors (KNN), Support
Vector Machine (SVM), and Convolutional Neural Network (CNN) algorithms.
These diverse algorithms offer complementary strengths in capturing different
aspects of the dataset, enhancing the accuracy of disease recognition in the

27
datasets. The dataset is divided into 80% for training and 20% for testing, ensuring
that the model is prepared on a sufficiently large and diverse set of examples while
also allowing for rigorous evaluation of its performance on unseen data. By
combining multiple machine learning techniques, we capitalize on the strengths
of each approach, resulting in a more robust and accurate disease detection
system. This approach holds promise for improving agricultural practices by
enabling early and precise identification of guava plant leaf diseases, ultimately
leading to more effective disease management and crop protection strategies.

4.2.6 MODEL EVALUATION


The Trained Hybrid Model of 80% plant leaves in the dataset is subjected
to the evaluation phase. In terms of accuracy and precision, performance metrics,
and computing efficiency, this leads to an increase. Therefore, this model
demonstrates a robust disease detection capability by achieving the optimal model
when compared to other combinations. The total amount of samples in the test
assortment divided by the total amount of predicted labels yields accuracy.
The following formula is used to calculate the accuracy mathematically:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝐶𝑝 / 𝑇𝑝

where Cp = Quantity of Precise Predictions; Tp = Total Quantity of Predictions.


In MATLAB, this is achieved by taking the mean value of the logical array
resulting from the element-wise comparison of predicted and true labels.

The evaluation phase of the trained hybrid model, focusing on 80% of the
plant leaves in the dataset, marks a critical step in assessing its performance and
computational efficiency. The model's strong disease detection capabilities is
shown through careful examination of performance indicators including accuracy
and precision, establishing it as the best solution among several combinations. In
evaluating the model’s performance, computational efficiency emerges as a key
consideration, especially in the context of real-world deployment where timely

28
decision-making is crucial. The trained hybrid model shows its effectiveness in
real-world applications by increasing computing efficiency without
compromising indicators of performance like accuracy and precision.

A key measure of the model's efficacy, the accuracy metric is computed as


the number of exact predictions (Cp) divided by the total number of predictions
(Tp). The ability of the model to accurately categorize samples within the test
assortment is expressed mathematically. This computation is made easier in
MATLAB by calculating the meaning of the logical array that emerges from
comparing the real and anticipated labels element-by-element. By applying this
formula, the accuracy of the trained hybrid model is quantified, providing
valuable insights into its performance in disease detection tasks. The resulting
accuracy metric serves as a benchmark for comparing the model against
alternative combinations and assessing its suitability for deployment in
agricultural settings.

4.2.7 MODEL FINE-TUNING


The performance of a pre-trained model is further tuned for a particular
task or dataset during the model fine-tuning phase of the machine learning
pipeline, which is an essential stage. In our study, the fine-tuning process was
conducted based on the evaluation results of the initial model, with a focus on
adjusting hyperparameters such as kernel sizes and learning rates to enhance
model performance. This iterative refinement process aims to maximize the
model's accuracy and effectiveness in identifying plant leaf diseases, ultimately
leading to more reliable and actionable insights for agricultural stakeholders.
The first step in fine-tuning the model was evaluating the original model's
performance using the dataset of plant leaves. This assessment indicated
opportunities for development and offered insightful information about the
model's advantages and disadvantages. Metrics including accuracy, precision,
recall, and F1 score were examined in order to obtain a thorough grasp of the

29
model's behavior and its capacity to distinguish between healthy and diseased
plant leaves. Based on the evaluation results, adjustments were made to the
model's hyperparameters to optimize its performance. Kernel sizes, which
determine the spatial extent of convolutional filters in convolutional neural
networks (CNNs), were fine-tuned to better capture relevant features in the input
images. By experimenting with different kernel sizes and architectures, we
aimed to improve the model's ability to extract meaningful patterns and
distinguish between different types of plant leaf diseases.
Throughout the fine-tuning process, careful attention was paid to
monitoring the model's performance on a validation dataset. This iterative
approach allowed us to systematically evaluate the impact of hyperparameter
adjustments on the model's accuracy and generalization ability. By iteratively
refining the model based on performance feedback, we aimed to achieve the
highest possible accuracy in identifying plant leaf diseases while minimizing the
risk of overfitting or underfitting.
This enhanced accuracy demonstrates the effectiveness of the fine-tuning
approach in refining the model's sensitivity and specificity in detecting plant leaf
diseases. By leveraging optimized hyperparameters and fine-tuned architectures,
the model is better equipped to support decision-making processes in agriculture,
facilitating prompt and focused responses to reduce crop losses and guarantee
food security.

4.3 ALGORITHMS

4.3.1 SUPPORT VECTOR MACHINE (SVM)

A potent ML technique for linear as well as nonlinear classification,


regression, and recognizing outliers is the Support Vector Machine (SVM). High
dimensional data and nonlinear connections may be handled using this
algorithm, which are versatile and effective in a range of applications. Their

30
methods perform exceptionally well when trying to find the biggest separation
hyperplane across the numerous classes that make up the target feature.

It is intended to use a Support Vector Machine (SVM) algorithm based


on colour histograms to categorize leaf images as healthy or unhealthy. First, it
loads photos of healthy and diseased leaves from different files and labels them
appropriately. Each image's colour histogram is calculated by dividing the RGB
channels and computing a histogram for each one. Feature vectors are created by
concatenating these histograms. The same process is used to extract features
from mixed test photos after they have been loaded. The calculated colour
histograms and associated labels are used to train the SVM model using a linear
kernel function. Lastly, the proportion of unhealthy leaves in the mixed folder is
computed and printed using the trained model's ability to predict the labels of the
mixed test photos. In order to differentiate between healthy and unhealthy leaves
based on colour information, the code illustrates a basic image classification
procedure using colour histograms and SVM classification.

Figure 9: Comparison – SVM

31
4.3.2 CONVOLUTIONAL NEURAL NETWORKS (CNN)
Convolutional Neural Networks (CNNs) are an example of deep learning
algorithms that perform very well in image recognition and processing tasks. It
is composed of several layers: one or more fully connected layers are used to
predict or classify the image; these layers come after the pooling layers, which
are used to down sample the feature maps and preserve the most important
information.
Convolutional Neural Networks (CNNs) were designed to be
implemented in order to identify leaf images as either healthy or unhealthy. It
first specifies paths to folders that hold pictures of healthy, sick, and mixed
leaves. Next, it defines a custom function called resizeImage to resize the images
and loads and resizes them using image Datastore from the healthy and diseased
directories. It then merges the datastores containing both healthy and sick photos.
The CNN architecture is defined next, consisting of layers for image input,
convolution, batch normalization, ReLU activation, max pooling, and fully
connected layers, followed by softmax and classification layers. Using stochastic
gradient descent with momentum ('sgdm'), training options are set. The defined
layers and aggregated datastores are used to train the CNN model. Following
training, the trained CNN model is used to make predictions after loading. Based
on labels, the script then determines the quantity and proportion of diseased
leaves in the mixed images.

Figure 10: Epochs – CNN

32
4.3.3 K NEAREST NEIGHBOR (KNN)
A reliable and user-friendly ML technique for solving regression and
categorization issues is the K-Nearest Neighbor (KNN) algorithm. By utilizing
the similarity notion, KNN identifies the label or value of a new data point via
its closest neighbors in the training dataset. It is widely relevant in real-life
circumstances since it is not parametric in nature, meaning it does not make any
fundamental inferences about the distribution of data (unlike other algorithms
like GMM, which assume a Gaussian distribution of the specified data). An
attribute-based prior data set (also known as training data) is provided to us,
allowing us to classify coordinates into groups.
It loads images of both damaged and healthy leaves and mixes them, then
extracts characteristics from the Histogram of Oriented Gradients (HOG). It
trains a K-Nearest Neighbors (KNN) classifier with these features. Next, using
a set of mixed leaf photos, it evaluates the classifier and determines, using the
predictions, the percentage of unhealthy leaves in the mixed folder. It shows how
to use the KNN algorithm and HOG features to categorize leaf images as healthy
or unhealthy.

Fig 11: Comparison - KNN

33
4.3.4 DECISION TREE
A modular and user-friendly supervised learning method that works with
both regression and classification issues is the Decision Tree algorithm. It works
by recursively splitting the dataset into subsets based on the most crucial
characteristic or feature that effectively separates the data into homogeneous
groups. As a consequence of these splits, a tree-like structure is produced, where
each internal node represents a characteristic, each branch indicates the split's
conclusion, and each leaf node holds the final prediction or decision.
The script uses a Decision Tree technique based on Local Binary Pattern
(LBP) features to categorize leaf photos as healthy or unhealthy. It first establishes
folder paths for photos of healthy, sick, and mixed leaves. After that, it uses
imageDatastore to load photos from these directories and changes them to
grayscale. To train the decision tree classifier, LBP features are taken from the
grayscale photos of healthy and diseased leaves and related labels are made. The
fitctree function is used to train the decision tree after the features and labels have
been combined. The trained decision tree is then utilised to predict the labels for
the LBP features that have been retrieved from the mixed images. Based on the
predictions, the script determines the proportion of sick leaves in the blended
photos and shows the outcome.

Figure 12: Comparison – Decision Tree

34
4.4 ANALYSIS OF DATASETS

4.4.1 APPLE DATASET


A total of 9714 photos were utilised from the collection; 2510 of these
photographs show apple leaves in a healthy state, and 7204 show leaves that are
damaged. We have further categorised the photos of damaged leaves into
categories such as Apple Scab, which has 2520 photos, Black Rot, which has
2484 photos, and Apple Cedar Rust, which has 2200 photos.
Using our system, we conducted the analysis utilising four distinct
algorithms, and the results are shown below.

APPLE LEAVES DISEASES


60

50

40

30

20

10

0
CEDAR RUST APPLE SCAB BLACK ROT

SVM KNN DT CNN

Figure 13: Comparison for Apple Leaves

35
4.4.2 GRAPES DATASET
A total of 9027 photos were used from the collection; 2115 of these images
show grape leaves in good condition, and 6912 show leaves that are sick. We
have further categorised the photos of sick leaves into categories such as Black
Rot (2360 images), Black Measles (2400 images), and Leaf Blight (2152
images).
Using our system, we conducted the analysis utilising four distinct
algorithms, and the results are shown below.

GRAPES LEAVES DISEASES


60

50

40

30

20

10

0
BLACK_ROT BLACK_MEASLES LEAF_BLIGHT

SVM KNN DT CNN

Figure 14: Comparison for Grapes Leaves

36
4.4.3 MANGO DATASET
We used a total of 3000 photos from the dataset, of which 500 showed
mango leaves in good health and the remaining 2500 showed leaves that were
sick. We have further categorised the photos of sick leaves into categories such
as 500 images of anthracnose, 500 images of bacterial canker, 500 images of die
back, 500 images of gall midge, and 500 images of powdery mildew.
Using our system, we conducted the analysis utilising four distinct
algorithms, and the results are shown below. The best method for classifying
mango leaf diseases was identified by comparing and analysing the output of
each algorithm. Our results offer insightful information about how well each
algorithm performs and whether it is appropriate for this purpose.

37
MANGO LEAVES DISEASES
80

70

60

50

40

30

20

10

0
SVM KNN DT CNN

ANTHRACNOSE BACTERIAL CANKER CUTTING WEEVIL DIE BACK


GALL MIDGE POWDERY MILDEW SOOTY MOULD

Figure 15: Comparison for Mango Leaves

4.4.4 POTATO DATASET


A total of 1152 photos from the dataset were used, of which 2000 images
showed sick leaves and 152 images showed healthy potato leaves. We have
further categorised the photos of sick leaves into categories such as Early Blight,
which includes roughly 1000 photos, and Late Blight, which includes 1000
images.
Using our system, we conducted the analysis utilising four distinct
algorithms, and the results are shown below.

38
POTATO LEAVES DISEASES
51.6

51.5

51.4

51.3

51.2

51.1

51

50.9
EARLY_BLIGHT LATE_BLIGHT

SVM KNN DT CNN

Figure 16: Comparison for Potato Leaves

4.4.5 TOMATO DATASET


We utilised a total of 11807 photographs from the dataset, of which 1146
show healthy tomato leaves and 10661 show damaged tomato leaves. We have
also further categorised the photos of diseased leaves into categories according
to the specific disease: 1532 photos show Bacterial Spot, 720 photos show Early
Blight, 1376 photos show Late Blight, 686 images show Leaf Mould, 1208
photos show Spider Mites, 1012 photos show Target Spot, 269 photos show
Mosaic Virus, and 3858 images show Yellow Leaf Curl.
Using our system, we conducted the analysis utilising four distinct
algorithms, and the results are shown below.

39
TOMATO LEAVES DISEASES
90
80
70
60
50
40
30
20
10
0
SVM KNN DT CNN

BACTERIAL_SPOT EARLY_BLIGHT LATE_BLIGHT LEAF_MOLD


SPIDER_MITES TARGET_SPOT MOSAIC_VIRUS YELLOW_LEAF

Figure 17: Comparison for Tomato Leaves

40
CHAPTER 5
TESTING

5.1 TESTING OF THE APPLICATION


The MATLAB App Designer Tool was utilised in the creation of our
application. Two buttons and one drop-down menu have been established in our
GUI. We must first choose our input image folder by selecting the "Select Dataset
Folder" button. Following selection, the folder name will appear below it. The
type of disease must then be selected from the drop-down menu; the disease type
will then be indicated beneath it. Now, it will automatically indicate the type of
algorithm used for the prediction based on the analysis of our previously trained
data. To view the finished product, we must then click the "Start Prediction"
button. Ten to fifteen seconds after pressing the button, it will appear on the screen.
As a result, our application predicts the sickness percentage from the user's
submitted set of images effectively and accurately.

Figure 18: MATLAB App for Apple Leaves Disease Prediction

41
Figure 19: MATLAB App for Grape Leaves Disease Prediction

42
CHAPTER 6
CONCLUSION AND FUTURE
ENHANCEMENTS
6.1 CONCLUSION
The use of machine learning (ML) algorithms into modern agricultural
practices is expected to bring about a paradigm change in crop management by
providing more proactive and data-driven approaches. Early detection of plant
diseases is necessary for crop infection management and quality assurance of
agricultural goods. The use of machine learning (ML) models has significantly
altered agriculture's understanding of disease and opened up exciting new
avenues for oversight and prevention. This study looks at a number of techniques
for classifying and identifying plant diseases, including a study of healthy and
diseased leaves.
We have demonstrated the disease percentages within various datasets. In
the dataset of Potato Leaves, Early Blight and Late Blight diseases collectively
account for 86.81%, aligning precisely with SVM and CNN algorithms.
Consequently, we deduce that both SVM and CNN algorithms effectively
predict disease percentages for potato leaves. In the case of Guava Leaves, Red
Rust disease is determined to be 40.85%, a figure that coincides precisely with
the CNN Algorithm. Therefore, we assert that CNN is adept at forecasting
disease percentages for guava leaves. Analysis of the Apple Leaves dataset
reveals that Scab Leaves constitute 47.39%, Black Rot stands at 47.06%, and
Cedar Rust amounts to 28.31%. As a result, we conclude that SVM algorithms
are suitable for predicting Scab and Black Rot disease leaves, while CNN is
optimal for Cedar Rust leaves. Upon examination of the Mango Leaves dataset,
it is evident that diseased leaves constitute approximately 50% across all types
of diseases. Thus, we ascertain that CNN algorithms can effectively predict
Anthracnose, Die Black, Gall Midge, and Powdery Mildew disease leaves, while
SVM is recommended for Powdery Mildew disease leaves and Decision Tree

43
algorithms for Bacterial Canker disease leaves. In the dataset of Grapes Leaves,
Black Rot disease leaves contribute roughly 73.61%, Black Measles around
76.58%, and Leaf Blight about 71.58%. Considering these findings, we conclude
that SVM can effectively predict all three diseased leaves, while CNN can also
predict Black Rot and Black Measles disease percentages. Lastly, analysis of the
Tomato Leaves dataset reveals that Bacterial Spot disease leaves account for
57.21%, Early Blight 38.59%, Late Blight 54.56%, Leaf Mold 37.45%, Spider
Mites 51.32%, Target Spot 46.90%, Mosaic Virus 19.01%, and Yellow Leaf
Curl 77.10%. Consequently, SVM algorithms are deemed suitable for predicting
Mosiac virus, Target Spot, Bacterial Spot, Leaf Mold, and Early Blight, Decision
Tree algorithms for Late Blight, and CNN for Bacterial Spot, Leaf Mold, Spider
Mites, Mosaic Virus, and Yellow Leaf Curl to accurately predict disease
percentages.

6.2 FUTURE ENHANCEMENTS


We aim to create a drone model for this application where live streaming
of images will be captured and sent through our system and therefore providing
the best and accurate output, since we believe it would be beneficial to a large
number of people worldwide, thereby increasing its influence even further.
Additionally, we have planned for hybrid techniques can be developed which
uses the combination of algorithms. The farmer or agricultural
organization might find it easier to identify diseases in plants leaves by using
this approach. The idea can be developed further by employing drones to record
videos from agricultural areas, uploading those videos straight into the system,
processing them into images, and then applying machine learning (ML)
techniques to those images to generate precise information about plant
leaf diseases. This methodology enhances sustainable farming practices by
providing significant support to the agriculture sector.

44
APPENDICES
APPENDIX 1

SVM:
% Load the leaf images from the Healthy and Diseased folders
healthyFolder = 'Healthy'; % Path to Healthy folder
diseasedFolder = 'Diseased'; % Path to Diseased folder

healthyImages = imageDatastore(healthyFolder);
diseasedImages = imageDatastore(diseasedFolder);

% Create labels for the images (0 for healthy, 1 for diseased)


healthyLabels = zeros(numel(healthyImages.Files), 1);
diseasedLabels = ones(numel(diseasedImages.Files), 1);

% Combine the data and labels


allImages = [healthyImages.Files; diseasedImages.Files];
allLabels = [healthyLabels; diseasedLabels];

% Initialize an array to store color histograms


numBins = 256; % Number of bins for each color channel
colorHistograms = zeros(numel(allImages), 3 * numBins); % 3 channels
(RGB)

% Compute color histograms for each image


for i = 1:numel(allImages)
img = imread(allImages{i});
% Split the image into RGB channels
redChannel = img(:, :, 1);
greenChannel = img(:, :, 2);
blueChannel = img(:, :, 3);
% Compute histograms for each channel
redHist = imhist(redChannel, numBins);
greenHist = imhist(greenChannel, numBins);
blueHist = imhist(blueChannel, numBins);
% Concatenate histograms

45
colorHistograms(i, :) = [redHist; greenHist; blueHist]';
end
% Load the mixed test images
mixedFolder = 'Mixed'; % Path to Mixed folder
mixedImages = imageDatastore(mixedFolder);

% Extract features (color histograms) from the mixed test images


mixedFeatures = zeros(numel(mixedImages.Files), 3 * numBins);
for i = 1:numel(mixedImages.Files)
img = imread(mixedImages.Files{i});
redChannel = img(:, :, 1);
greenChannel = img(:, :, 2);
blueChannel = img(:, :, 3);
redHist = imhist(redChannel, numBins);
greenHist = imhist(greenChannel, numBins);
blueHist = imhist(blueChannel, numBins);
mixedFeatures(i, :) = [redHist; greenHist; blueHist]';
end

% Train the SVM model


svmModel = fitcsvm(colorHistograms, allLabels, 'KernelFunction',
'linear');

% Predict the labels for the mixed test images


mixedPredictions = predict(svmModel, mixedFeatures);

% Calculate the percentage of diseased leaves in the Mixed folder


numDiseased = sum(mixedPredictions);
totalLeaves = numel(mixedImages.Files);
percentageDiseased = (numDiseased / totalLeaves) * 100;
fprintf('Percentage of diseased leaves in the Mixed folder: %.2f%%\n',
percentageDiseased);

CNN:
% Define paths to the folders
healthy_folder = 'Healthy';
diseased_folder = 'Diseased';

46
mixed_folder = 'Mixed';

% Load and resize images from Healthy folder


healthy_images = imageDatastore(healthy_folder, 'LabelSource',
'foldernames', 'ReadFcn', @resizeImage);

% Load and resize images from Diseased folder


diseased_images = imageDatastore(diseased_folder, 'LabelSource',
'foldernames', 'ReadFcn', @resizeImage);

% Combine the datastores


all_images = imageDatastore(cat(1, healthy_images.Files,
diseased_images.Files), 'LabelSource', 'foldernames', 'ReadFcn',
@resizeImage);

% Define CNN architecture


layers = [
imageInputLayer([256 256 3])
convolution2dLayer(3, 16, 'Padding', 'same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2, 'Stride', 2)
convolution2dLayer(3, 32, 'Padding', 'same')
batchNormalizationLayer
reluLayer
fullyConnectedLayer(2)
softmaxLayer
classificationLayer];

% Set training options


options = trainingOptions('sgdm', ...
'MaxEpochs', 10, ...
'InitialLearnRate', 0.001, ...
'Shuffle', 'every-epoch', ...
'ValidationData', all_images, ...
'ValidationFrequency', 10, ...
'Verbose', false, ...

47
'Plots', 'training-progress');

% Train the CNN


net = trainNetwork(all_images, layers, options);
% Load and resize images from Mixed folder
mixed_images = imageDatastore(mixed_folder, 'ReadFcn', @resizeImage);

% Perform prediction
predicted_labels = classify(net, mixed_images);

% Count the number of diseased leaves


num_diseased_leaves = sum(predicted_labels == 'Diseased');

% Calculate the percentage of diseased leaves


total_images = numel(predicted_labels);
percentage_diseased_leaves = (num_diseased_leaves / total_images) * 100;
fprintf('Percentage of diseased leaves in the mixed folder: %.2f%%\n',
percentage_diseased_leaves);

% Function to resize images


function img = resizeImage(filename)
img = imread(filename);
img = imresize(img, [256 256]);
end

KNN:
% Load images
healthyImages = imageDatastore('Healthy', 'IncludeSubfolders', true,
'LabelSource', 'foldernames');
diseasedImages = imageDatastore('Diseased', 'IncludeSubfolders', true,
'LabelSource', 'foldernames');
mixedImages = imageDatastore('Mixed', 'IncludeSubfolders', true);

% Combine healthy and diseased images


allImages = [healthyImages; diseasedImages];

48
% Extract HOG features
trainingFeatures = [];
trainingLabels = [];

for i = 1:numel(allImages.Files)
img = readimage(allImages, i);
features = extractHOGFeatures(img);
trainingFeatures = [trainingFeatures; features];
trainingLabels = [trainingLabels; allImages.Labels(i)];
end

% Train the classifier


knnModel = fitcknn(trainingFeatures, trainingLabels, 'NumNeighbors', 5);

% Test the classifier on mixed images


testFeatures = [];
for i = 1:numel(mixedImages.Files)
img = readimage(mixedImages, i);
features = extractHOGFeatures(img);
testFeatures = [testFeatures; features];
end

predictions = predict(knnModel, testFeatures);

% Calculate the percentage of diseased leaves


diseasedCount = sum(predictions == 'Diseased');
totalLeaves = numel(mixedImages.Files);
percentageDiseased = (diseasedCount / totalLeaves) * 100;

fprintf('Percentage of diseased leaves in the mixed folder is %f%%\n',


percentageDiseased);

Decision Tree:
% Add your folder paths
healthyFolder = 'Healthy';

49
diseasedFolder = 'Diseased';
mixedFolder = 'Mixed';

% Read images from the folders


healthyImages = imageDatastore(healthyFolder);
diseasedImages = imageDatastore(diseasedFolder);
mixedImages = imageDatastore(mixedFolder);

% Initialize feature arrays


healthyFeatures = [];
diseasedFeatures = [];

% Extract LBP features for healthy images


for i = 1:numel(healthyImages.Files)
I = imread(healthyImages.Files{i});
I = rgb2gray(I); % Convert to grayscale
features = extractLBPFeatures(I);
healthyFeatures = [healthyFeatures; features];
end

% Extract LBP features for diseased images


for i = 1:numel(diseasedImages.Files)
I = imread(diseasedImages.Files{i});
I = rgb2gray(I); % Convert to grayscale
features = extractLBPFeatures(I);
diseasedFeatures = [diseasedFeatures; features];
end

% Create labels
healthyLabels = ones(size(healthyFeatures,1),1);
diseasedLabels = zeros(size(diseasedFeatures,1),1);

% Combine data and labels


features = [healthyFeatures; diseasedFeatures];
labels = [healthyLabels; diseasedLabels];

50
% Train the decision tree
tree = fitctree(features, labels);

% Now apply the trained model to the mixed data


mixedFeatures = [];
for i = 1:numel(mixedImages.Files)
I = imread(mixedImages.Files{i});
I = rgb2gray(I); % Convert to grayscale
features = extractLBPFeatures(I);
mixedFeatures = [mixedFeatures; features];
end

predictedLabels = predict(tree, mixedFeatures);

% Calculate the percentage of diseased leaves


percentageDiseased = sum(predictedLabels)/length(predictedLabels) *
100;

% Display the result


disp(['The percentage of diseased leaves is: ',
num2str(percentageDiseased), '%']);

51
APPENDIX 2

ACHIEVEMENTS

 Securing the SECOND-PLACE presenting “AGRI


DETECT: MACHINE LEARNING FOR TIMELY PLANT
DISEASE IDENTIFICATION USING LEAVES” in Technical
Event – VISION-X at Nutpam 2K23 – A National Level
Tech Fest held on 17th October 2023 at Sri Sairam Institute
of Technology.
 Securing the BEST PAPER AWARD presenting “AGRI
DETECT: MACHINE LEARNING FOR TIMELY
PLANT DISEASE IDENTIFICATION USING
LEAVES” at the National Conference on Machine
Learning Applications in Communications, Networking
and Technology (NCMLACNT’ 24) held on April 6th 2024
at RMD Engineering College.

52
53
54
55
56
REFERENCES

[1] R. Chapaneri, M. Desai, A. Goyal, S. Ghose and S. Das, "Plant Disease


Detection: A Comprehensive Survey," 2020 3rd International
Conference on Communication System, Computing and IT Applications
(CSCITA), Mumbai, India, pp. 220-225, 2020.
[2] D. Gosai, B. Kaka, D. Garg, R. Patel and A. Ganatra, "Plant Disease
Detection and Classification Using Machine Learning
Algorithm," 2022 International Conference for Advancement in
Technology (ICONAT), Goa, India, pp. 1-6, 2022.
[3] E. Hirani, V. Magotra, J. Jain and P. Bide, "Plant Disease Detection
Using Deep Learning," 2021 6th International Conference for
Convergence in Technology (I2CT), Maharashtra, India, pp. 1-4, 2021.
[4] M. A. Jasim and J. M. AL-Tuwaijari, "Plant Leaf Diseases Detection
and Classification Using Image Processing and Deep Learning
Techniques," 2020 International Conference on Computer Science and
Software Engineering (CSASE), Duhok, Iraq, pp. 259-265, 2020.
[5] J. Kolli, D. M. Vamsi and V. M. Manikandan, "Plant Disease Detection
using Convolutional Neural Network," 2021 IEEE Bombay Section
Signature Conference (IBSSC), Gwalior, India, pp. 1-6, 2021.
[6] S. Kumar, K. Prasad, A. Srilekha, T. Suman, B. P. Rao and J. N. Vamshi
Krishna, "Leaf Disease Detection and Classification based on Machine
Learning," 2020 International Conference on Smart Technologies in
Computing, Electrical and Electronics (ICSTCEE), Bengaluru, India,
pp. 361-365, 2020.
[7] R. Kundu, U. Chauhan and S. P. S. Chauhan, "Plant Leaf Disease
Detection using Image Processing," 2022 2nd International Conference
on Innovative Practices in Technology and Management (ICIPTM),
Gautam Buddha Nagar, India, pp. 393-396, 2022.
[8] L. Li, S. Zhang and B. Wang, "Plant Disease Detection and
Classification by Deep Learning—A Review," in IEEE Access, vol. 9,

57
pp. 56683-56698, 2021.
[9] P. Nayar, S. Chhibber and A. K. Dubey, "An Efficient Algorithm for
Plant Disease Detection Using Deep Convolutional Networks," 2022
14th International Conference on Computational Intelligence and
Communication Networks (CICN), Al-Khobar, Saudi Arabia, pp. 156-
160, 2022.
[10] G. K. Sandhu and R. Kaur, "Plant Disease Detection Techniques: A
Review," 2019 International Conference on Automation,
Computational and Technology Management (ICACTM), London, UK,
pp. 34-38, 2019.
[11] R. S. K. R, A. Singh, H. J. S V, A. D and J. S. Jayasree, "Plant Disease
Detection and Diagnosis using Deep Learning," 2022 International
Conference for Advancement in Technology (ICONAT), Goa, India, pp.
1-6, 2022.
[12] M. Ş. Soyer, C. Yılmaz, İ. M. Ozcan, F. Cogen and T. Ç. Yıldız,
"LeafLife: Deep Learning Based Plant Disease Detection
Application," 2021 13th International Conference on Electrical and
Electronics Engineering (ELECO), Bursa, Turkey, pp. 398-402, 2021.
[13] A. Suljović, S. Čakić, T. Popović and S. Šandi, "Detection of Plant
Diseases Using Leaf Images and Machine Learning," 2022 21st
International Symposium INFOTEH-JAHORINA (INFOTEH), East
Sarajevo, Bosnia and Herzegovina, pp. 1-4, 2022.
[14] Sunil S. Harakannanavar, Jayashri M. Rudagi, Veena I Puranikmath,
Ayesha Siddiqua, R Pramodhini,Plant leaf disease detection using
computer vision and machine learning algorithms,Global Transitions
Proceedings,Volume 3, Issue 1,2022,ISSN 2666-285X.

[15] D. Varshney, B. Babukhanwala, J. Khan, D. Saxena and A. K. Singh,


"Plant Disease Detection Using Machine Learning Techniques," 2022
3rd International Conference for Emerging Technology (INCET),
Belgaum, India, pp. 1-5, 2022.

58

You might also like