0% found this document useful (0 votes)
126 views50 pages

Final

The document discusses a project report on identifying leaf pathogens using machine learning. It aims to automatically diagnose three common rice leaf diseases and suggest remedies. Deep learning and support vector machines are used to classify visual features of leaf images and recognize the diseases.

Uploaded by

opakashi345
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views50 pages

Final

The document discusses a project report on identifying leaf pathogens using machine learning. It aims to automatically diagnose three common rice leaf diseases and suggest remedies. Deep learning and support vector machines are used to classify visual features of leaf images and recognize the diseases.

Uploaded by

opakashi345
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 50

LEAF PATHOGEN IDENTIFICATION USING MACHINE

LEARNING

A PROJECT REPORT

Submitted by

JEYAPRIYA S (510120205006)
RAMYA A (510120205011)

In partial fulfilment for the award of the degree

of

BACHELOR OF ENGINEERING

in

INFORMATION TECHNOLOGY

ADHIPARASAKTHI COLLEGE OF ENGINEERING, KALAVAI

ANNA UNIVERSITY::CHENNAI 600 025

MAY 2024
ANNA UNIVERSITY::CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “LEAF PATHOGEN IDENTIFICATION USING


MACHINE LEARNING” is the bonafide work of JEYAPRIYA S (510120205006),
RAMYA A (510120205011) who carried out the project work under my supervision.

SIGNATURE SIGNATURE
Mrs.S.SHARMILA.,MCA.,M.E.,(Ph.D) Mrs.S. SHARMILA.,MCA.,M.E.,(Ph.D)
HEAD OF THE DEPARTMENT SUPERVISOR

Department of Information Technology HOD&Asst.Professor


Adhiparasakthi College of Engineering Department of Information Technology
G.B.Nagar, Kalavai. Adhiparasakthi College of Engineering
G.B.Nagar, Kalavai.

Submitted for the project and Viva-voce held on

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT

With the divine blessings of Goddess Adhiparasakthi, we express our deep


gratitude to his Holiness Arul Thiru Padma Shri Bangaru Adigalar, Founder
President and Thirumathi Lakshmi Bangaru Adigalar, Vice President for providing
an amazing environment for the development and promotion of this under graduate
education in our college under ACMEC Trust.

We are very grateful to Sakthi Thirumathi Dr.B.Umadevi, Correspondent,


Adhiparasakthi College of Engineering, for her encouragement and inspiration. We are
very grateful to Sakthi Thiru R.Karunanidhi, Secretary for his continuous support.
We are highly indebted to our Principal Prof. Dr.S.Mohanamurugan for his valuable
guidance.

We wish to place our sincere gratitude to Prof. Mrs.S.Sharmila, Head of the


Department, Department of Information Technology, for her motivation and
permitting us to do this work.

We wish to extend gratitude to our Project Coordinator & Supervisor


Mrs.S.Sharmila, HOD & Assistant Professor, Department of Information Technology
for her kind guidance, help and suggestion for completion of the project successfully.

We are thankful to all teaching and non-teaching staff of our department for their
constant cooperation and encouragement in pursuing our project work.

iv
ABSTRACT

Food production in India is largely dependent on cereal crops including rice, wheat

and various pulses. Rice is one of the staple foods of the world. But the production of

rice is hampered by various kinds of rice diseases. One of the main diseases of rice is

a leaf disease. Generally, it is very time-consuming and laborious for farmers of

remote areas to identify rice leaf diseases due to unavailability of experts. Though

experts are available in some areas, disease detection is performed by naked eye

which causes inappropriate recognition sometimes. An automated system can

minimize these problems. In this paper, an automated system is proposed for

diagnosis three common rice leaf diseases (Brown spot, Leaf blast, and Bacterial

blight) and pesticides and/or fertilizers are advised according to the severity of the

diseases. Deep Learning has a Conventional Neural Network that is used to find

features from the leaf of plant. Visual contents (colour, texture, and shape) are used as

features for classification of these diseases. The type of rice leaf diseases is

recognized by Support Vector Machine (SVM) classifier. After recognition, the

predictive remedy is suggested that can help agriculture related people and

organizations to take appropriate actions against these diseases.

v
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

TABLE OF CONTENTS V
LIST OF FIGURES vii
LIST OF SYMBOLS viii

LIST OF ABBREVATIONS X

1 INTRODUCTION 1
1.1 AIM AND OBJECTIVE 1
1.2 PURPOSE AND SCOPE
2 LITERATURE SURVEY 4

3 EXISTING SYSTEM 7
3.1 EXISTING METHOD 7
3.2 DISADVANTAGES 7
4 PROPOSED SYSTEM 8
4.1 PROPOSED METHOD 8
4.2 ADVANTAGES 8
5 SYSTEM ANALYSIS 9
5.1 HARDWARE REQUIREMENTS 9
5.2 SOFTWARE REQUIREMENTS 9
5.3 DATABASE REQUIREMENTS 10
5.4 SOFTWARE SPECIFICATION 10
5.4.1 PYTHON 11
5.5 TOOLS AND LIBRARIES 14
5.5.1 PANDAS 14

vi
5.5.2 PIL 15
5.5.3 PYTORCH 16
5.5.4 FLASK 17
5.5.5 NUMPY 18
5.5.6 CLICK 19
5.5.7 TORCH VISION 20
6 SYSTEM DESIGN 21
6.1 GENERAL 18
6.2 SYSTEM ARCHITECTURE 19
6.3 USE CASE DIAGRAM 20
6.4 CLASS DIAGRAM 21
6.5 SEQUENCE DIAGRAM 22
6.6 DATA FLOW DIAGRA 25
6.7 ACTIVITY DIAGRAM
6.8 FLOWCHART
7 MODULES 26
7.1 LIST OF MODULES 26
7.2 MODULES DESCRIPTION 26
7.2.1 DATA COLLECTION & PRE- 26
PROCESSING 27
7.2.2 MODEL CREATION 28
7.2.3 TRAINING AND EVALUATION
7.2.4 DEPLOYMENT
8 SYSTEM TESTING 29
8.1 GENERAL 29
8.2 TYPES OF TESTING
9 IMPLEMENTATION
10 CONCLUSION AND FUTURE ENHANCEMENT

vii
9.1 CONCLUSION
9.2 FUTURE ENHANCEMENT

APPENDIX-1 SOURCE CODE


APPENDIX-2 SCREENSHOTS
APPENDIX-3 REFERENCES

viii
LIST OF FIGURES

FIGURE NO. NAME OF THE FIGURES PAGE NO.


6.2 System Architecture 18

6.3 Use Case Diagram 19

6.4 Class Diagram 20

6.5 Sequence Diagram 21

6.6 Data Flow Diagram 22

6.7 Activity Diagram 22

6.8 Flow Chart 25

ix
LIST OF SYMBOLS
S.NO NAME NOTATION DESCRIPTION

Class Name
Represents a
+ public
collection of similar
1 Class -attribute
-private entities grouped
-attribute together.

Association
represent static
relationships
between classes.
Association Class Class Roles represent
2
B the way the two
classes see each
Class Class other.

Used for additional


Relation User process
3 (user) communication.

Extends relationship
is used when one use
Relation case is similar to
4 extends
(extends) another use case but
does a bit more.

Communication
5 between various use
Communication
x
cases

xi
Interaction between
6 Usecase the system and
external environment.

A circle in DFD
represents a state or
7 Data Process/
process which has
State
been triggered due to
some event or action.

Represents external
entities such as
8 External keyboard, sensors,
entity etc.

Represents the
vertical dimensions
9 Object Lifeline that the object
communications.

Represents the
10 Message Message message exchanged.

LIST OF ABBREVATIONS

1
 SVM - Support Vector Machine

 AI - Artificial Intelligence

 NN - Neural Network

 CNN - Convolution Neural Network

 ANN - Artificial Neural Network

 GUI - Graphical User Interface

 BSD - Berkeley Segmentation Dataset

 PIL - Python Image Library

 SQL - Structured Query Language

 CLICK - Command Line Interface Creation Kit

CHAPTER 1

2
INTRODUCTION

Agribusiness has developed into a significant source of financial


improvement, and in India, 80% of the population is dependent on agriculture.
Depending on the type of soil, local climate, and economic value, the rancher
selects the best yield. The majority of the time, farmers commit suicide as a
result of production loss because they are unable to repay the bank debts they
have taken out for farming. We have seen that the environment is changing
steadily right now, which is bad for the crops and pushes farmers into debt and
suicide. When different mathematical or statistical techniques are used to data,
these risks can be reduced, and by doing so, we can tell the farmer about plant
diseases that affect his agricultural area.

The agricultural industries started looking for new ways to enhance food
production as a result of the expanding population, climatic changes, and poor
governance. Researchers are working to create new, capable, and distinctive
technologies for producing outcomes with great efficiency. A huge harvest can
be produced using a variety of approaches in the agricultural sector. Precision
agriculture is the most recent and beneficial of them all. Based on the data
gathered, farmers can use the precision agriculture technology to gain insights
for increasing agricultural yield. Precision agriculture can be utilised for a
variety of purposes, including the identification of plant pests, the detection of
weeds, the production of yield, the detection of plant diseases, etc. The
categorization of agricultural products is done to stop losses in yield and
quantity. If the proper evaluation isn't made of this technique or class, it has
major effects on plants and affects the quality or productivity of the
corresponding product. Several problems, including low productivity and
financial losses to ranchers and farmers, are being caused by illnesses in
harvests.

3
Leaf disease identification is a crucial aspect of plant health management
and agricultural sustainability. It involves the identification and classification of
various diseases that affect the leaves of plants, such as fungi, bacteria, viruses,
and nutrient deficiencies. Early detection and accurate diagnosis of these
diseases are essential for timely intervention and effective control measures.
Various methods, including visual inspection, symptomatology, and advanced
diagnostic tools like molecular techniques and machine learning algorithms, are
employed to identify leaf diseases. Proper identification allows farmers and
gardeners to implement appropriate treatment strategies, including chemical
treatments, cultural practices, and biological control methods, to mitigate the
spread of disease and ensure the health and productivity of plants.

In order to automatically diagnose diseases, image processing techniques


are therefore required. Techniques for image processing aid in both accurately
diagnosing diseases and categorising them. It is also feasible to identify the
diseases early and accurately. The productivity and quality of the yield will
both rise as a result. Human labour is greatly decreased by image processing
systems. Many agricultural applications, such as the classification of various
diseases and the identification of plant leaflets, make extensive use of image
processing techniques. In order to automatically diagnose diseases, image
processing techniques are therefore required. Techniques for image processing
aid in both accurately diagnosing diseases and categorising them. It is also
feasible to identify the diseases early and accurately.
In general, image processing techniques have following steps to identify the
diseases:
• Image acquisition

• Image pre-processing

• Segmentation

4
• Feature Extraction

• Classification.
Different methods have been used for segmentation, feature extraction and for
classification various classifiers are available

1.1 AIM AND OBJECTIVE

The major goal of this study is to develop a model that can distinguish
between healthy and diseased harvest leaves and, in the event that the crop has a
disease, to identify which disease it is. This study used 54,306 photos, including
pictures of ill and healthy plant leaves, to train a convolution neural network
model to identify 14 crop species, 26 diseases, and 38 classes. In a held-out test
set, this trained model has an accuracy of 99.35%. In this process, the collected
leaves are analysed using a number of resnet18 models. These resnet18 models
are trained using the transfer learning approach. The first layer can be used to
identify different leaf kinds, while the supporting layer can be used to screen for
potential plant illnesses. Deep learning produces results with higher accuracy,
which may be used to diagnose crop diseases and analyse down to the smallest
pixel-level component of an image. It is impossible to study this level of detail
with the human eye.

1.2 PURPOSE AND SCOPE

In this methodology, plant diseases are classified and detected using


machine learning techniques and image processing techniques, respectively.
Supervised learning and unsupervised learning are the two main classifications
of machine learning. The label values in the supervised learning method are
known. Regression and classification are two examples of supervised learning
techniques. The label values in the unsupervised learning approach are
unknown. An illustration of an unsupervised learning technique is clustering

5
and association. Several fields, including computing, biology, marketing,
medical diagnosis, game playing, etc., can benefit from machine learning. Many
methods, including naive bayes, k-means clustering, support vector machines,
artificial neural networks, decision trees, and random forests, among others, are
supported by machine learning. Data collection, dataset organisation, feature
extraction, pre-processing, feature selection, selecting and implementing
machine learning algorithms, and execution evaluation are some of the
fundamental phases in machine learning.

Beyond the arts, image processing has more utilitarian applications. It now
serves the following specialised functions in agricultural applications:
• To identify sick leaves, stems, and fruits;

• to quantify the disease's impact;

• to identify the form and colour of the infected areas;

• to identify the size and shape of the fruits; etc.

6
CHAPTER 2

LITERATURE SURVEY

RICE LEAF DISEASE PREDICTION USING MACHINE LEARNING


Using different image processing approaches, Xu Piefeng et al. [13] used
SVM and NN to identify wheat leaf accuracy, which ranged from 92.3 percent to
96.2 percent. Shah JP et al. [14] used similar method and conducted learning on
rice plant infection categorization techniques. The survey was conducted based on
the variety of classes, different segmentation approaches, and classification
accuracy. They stated that by applying SVM and NN methods on image features,
the greatest accuracy may be attained. Shen Wei-zheng et al. [15] used a back
propagation neural network to study the automated detection of sheath blight and
rice blast. The performances of the BP neural network model were used to
improve the parameters. The model has a decent recognition performance,
according to the results. To detect cucumber leaf disease, R. Kawasaki et al. [16]
suggested a CNN based method that achieves 94.9 percent accuracy in
differentiating between two diseased and a non-diseased class. Ze-xin Guan et al.
[17] employed pattern recognition and image processing technologies to segment
rice disease pictures and extract characteristics, resulting in the creation of a rice
disease detection system. Singh V et al. [18] demonstrated an image
segmentation-based method for detecting plant infection. They conducted trials on
the leaves of fruits and vegetables and proposed the usage of NN, fuzzy logic, and
other AI methods to increase the accuracy of the present model. In a study,
Madiwalar Shriroop et al. [19] analyzed a variety of feature extraction approaches
and discovered that the Gabor filter is effective for detecting tiny spots on leaves,
whereas GLCM features [20] are effective for detecting all other sorts of
infections.

7
CHAPTER 3

SYSTEM ANALYSIS

3.1. EXISTING SYSTEM

Using the public dataset, which contains 54,306 photos of the diseases and
healthy plant leaves that are taken under controlled conditions, they have
trained a model in the current system to recognise some unique harvests and 26
diseases.
The Res Nets algorithm was used in the existing system paper. The Res Net
algorithm produced highly accurate findings and was able to identify more
diseases from different harvests. The Res Nets approaches utilized several of
the parameters, including weight decay, gradient clipping, and scheduling
learning rate. The experiment made use of a kaggle data collection. The 87k
RGB photos of healthy and sick harvest leaves in this collection are arranged
into 38 different classes. For the training task on this dataset, an 80% ratio is
used, and a 20% ratio is used for the testing task. 33 test photos with an
additional index that was made subsequently for predictive purposes.
3.2 DISADVANTAGES OF EXISTING SYSTEM:

• High computational complexity – Residual neural networks may not


be appropriate for all jobs and frequently need a lot of computing
power.
• Residual networks are easily able to pick up on the underlying data
patterns, which might result in overfitting and subpar generalisation.
• Large quantities of memory are needed to hold the parameters and
weights that are required by residual networks.

8
CHAPTER 4
PROPOSED SYSTEM

4.1 PROPOSED SYSTEM


The main objective of this project is to create a model that can
differentiate between healthy and sick harvest leaves and, in the event that the
crop has a disease, to determine which sickness it is. This study examined
70,295 plant photos, including those of the tomato, blueberry, orange, peach,
corn (maize), potato, raspberry, soy bean, strawberry, and blueberry. The
referenced dataset is taken from the well-known public source kaggle. For the
purpose of identifying and categorising plant diseases in the suggested system,
we employed the InceptionV3 Architecture. The initial phase of the system for
identifying and classifying plant diseases is dataset loading. This dataset of
plant photos contains photographs of both healthy and sick plants. The second
step in the approach for identifying and classifying plant diseases is
preprocessing. At this point, only maintain the relevant data and remove
irregular and noisy data from the dataset. The next step in the process of
identifying and categorising plant diseases is feature extraction. Image
categorization greatly benefits from feature extraction. There are several uses
for feature extraction. This method has discovered that morphological
outcomes provide preferred results over other aspects. The dataset can also be
used to identify rare plants and rare diseases. The classification technique is
the next phase of the system for recognising and classifying plant infections.
Users can recognise and describe the plant infection at the final stage. This
represents the system's ultimate end stage. The proposed system had a
validation accuracy of 89.21% and a training accuracy of 91.34%.

9
4.2 ADVANTAGES OF PROPOSED SYSTEM

 The suggested system model is more effective.


 Compared to the Inception V1 and V2 models, the suggested system
model has a deeper network, although its speed is unaffected.
 The suggested method provides an accurate identifying strategy for plant
leaf diseases with good accuracy and leverages auxiliary Classifiers as
regularizers.
 Reduces labor costs and the need for expensive chemical treatments, en-
hancing profitability.
 The suggested method provides an accurate identifying strategy for plant
leaf diseases with good accuracy and leverages auxiliary Classifiers as
regularizers.

10
CHAPTER 5

SYSTEM ANALYSIS

5.1 HARDWARE REQUIREMENTS

The hardware necessities might function the premise for a contract for the
implementation of the system and will thus be an entire and consistent
specification of the total system. They’re utilized by software system engineers
because the start line for the system style. It shows what the system will and not
however it ought to be enforced.

 Intel i3 or equivalent.
 4GB RAM (8GB preferred)
 256GB SSD – 10GB available hard disk space.
 Proper internet connection.

5.2 SOFTWARE REQUIREMENTS

The software necessities document is that specification of the system. It


ought to embody each a definition and a specification of necessities. It’s a group
of what the system ought to do instead of however it ought to get laid. The
software system necessities offer a basis for making the software system
necessities specification.

 OS: Windows 10 / 11 64 bit.


 Pycharm / VS Code
 Python 3.4 (the latest version is Python 3.7)
 Python 3.4 with framework: TensorFlow or PyTorch.
 Image processing library: PIL (pillow)

5.3 DATABASE REQUIREMENTS

The dataset required for training the model is:

11
5.4 SOFTWARE SPECIFICATION

5.4.1 PYTHON

Python, conceived by Guido van Rossum in the late 1980s, is a high-level,


interpreted programming language that was officially released in 1991. Guido van
Rossum aimed to create a language that prioritized code readability and
simplicity, making it accessible to both novice and experienced programmers
alike. Python's development was influenced by various programming languages,
including ABC, Modula-3, and C. Since its initial release in 1991, Python has
undergone significant evolution, growing in popularity and versatility. Despite its
early beginnings, Python has managed to adapt and thrive, becoming one of the
most widely used programming languages across diverse domains, such as web
development, data science, artificial intelligence, scientific computing, and more.
Python's success can be attributed to several key factors. Firstly, it’s clear and
intuitive syntax makes it easy to learn and understand, even for those new to
programming. The language's readability encourages good coding practices and
facilitates collaboration among developers. Python boasts a rich ecosystem of
libraries and frameworks that extend its functionality and facilitate various tasks.
Over the years, developers have created numerous libraries for specialized
purposes, ranging from data analysis and visualization to machine learning and
web development. This extensive library support enables developers to build
complex applications efficiently, without needing to reinvent the wheel.

In terms of data types and control structures, Python offers a flexible and
powerful set of features. It supports a variety of data types, including integers,
floating-point numbers, complex numbers, strings, lists, tuples, and dictionaries.
These data types allow developers to handle different kinds of data effectively,

12
whether it's numerical data, text, or structured collections of items. Functions play
a pivotal role in Python programming, allowing developers to encapsulate
reusable blocks of code. By defining functions, developers can write modular
and /maintainable code, promoting code efficiency and readability. Python is a
general-purpose, interpreted programming language. It is collected and
dynamically typed. Although compilation is a step, Python is an interpreted
language rather than a compiled one. Before being placed in a.pyc or.pyo file,
byte code created from Python code written in.py files are first compiled. Python
source code was converted to byte code rather than machine code, like C++. A
low-level collection of instructions known as byte code can be carried out.

SOME FEATURES OF PYTHON

 Free and Open Source: Python language is freely available at the official
website and you can download it from the given download link below
click on the Download Python keyword. Download Python Since it is
open-source, this means that source code is also available to the public. So
you can download it, use it as well as share it.
 Easy to code: Python is a high-level programming language. Python is
very easy to learn the language as compared to other languages like C, C#,
Javascript, Java, etc. It is very easy to code in the Python language and
anybody can learn Python basics in a few hours or days. It is also a
developer-friendly language.
 Object-Oriented Language: One of the key features of Python is Object-
Oriented programming. Python supports object-oriented language and
concepts of classes, object encapsulation, etc.
 GUI Programming Support : Graphical User interfaces can be made
using a module such as PyQt5, PyQt4, wxPython, or Tk in Python. PyQt5
is the most popular option for creating graphical apps with Python.

13
 High-Level Language: Python is a high-level language. When we write
programs in Python, we do not need to remember the system architecture,
nor do we need to manage the memory.
 Large Community Support: Python has gained popularity over the
years. Our questions are constantly answered by the enormous
StackOverflow community. These websites have already provided answers
to many questions about Python, so Python users can consult them as
needed.
 Easy to Debug: Excellent information for mistake tracing. You will be
able to quickly identify and correct the majority of your program’s issues
once you understand how to interpret Python’s error traces. Simply by
glancing at the code, you can determine what it is designed to perform.
 Python is a Portable language: Python language is also a portable
language. For example, if we have Python code for Windows and if we
want to run this code on other platforms such as Linux, Unix, and Mac
then we do not need to change it, we can run this code on any platform.
 Python is an Integrated Language: Python is also an integrated language
because we can easily integrate Python with other languages like C, C++,
etc.
 Interpreted Language: Python is an Interpreted Language because
Python code is executed line by line at a time. like other languages C, C+
+, Java, etc. there is no need to compile Python code this makes it easier to
debug our code. The source code of Python is converted into an immediate
form called bytecode.
 Large Standard Library: Python has a large standard library that
provides a rich set of modules and functions so you do not have to write
your own code for every single thing. There are many libraries present in
Python such as regular expressions, unit-testing, web browsers, etc.

14
 Dynamically Typed Language: Python is a dynamically-typed language.
That means the type (for example- int, double, long, etc.) for a variable is
decided at run time not in advance because of this feature we don’t need to
specify the type of variable.

 Large community support: Python has a large and active community of


users and developers. This means that there is a wealth of resources avail-
able to help you learn and use Python
 Frontend and backend development: This will help you do frontend de-
velopment work in Python like Javascript. Backend is the strong forte of
Python it’s extensively used for this work cause of its frameworks
like Django and Flask.
 Extensible: A programming language is said to be extensible if it can be
extended to other languages. Python code can also be written in other lan-
guages like C++, making it a highly extensible language.

5.5 TOOLS AND LIBRARIES


5.5.1 PANDAS

Pandas is an open-source, BSD-licensed Python library providing high-


performance, easy-to-use data structures and data analysis tools for the Python
programming language. Pandas is built on top of NumPy and provides high-
performance, easy-to-use data structures and data analysis tools for working with
structured (tabular, multidimensional, potentially heterogeneous) and time series
data. Pandas is a popular tool for data scientists, statisticians, and analysts
working with large and complex datasets. It is also a valuable tool for machine
learning engineers and other data professionals. Pandas is a powerful and versatile
library that can be used for a wide variety of data science tasks. It is a valuable
tool for anyone who works with data in Python. Pandas strengthens Python by
giving the popular programming language the capability to work with

15
spreadsheet-like data enabling fast loading, aligning, manipulating, and merging,
in addition to other key functions. Pandas is prized for providing highly optimized
performance when back-end source code is written in C or Python.
FEATURES OF PANDAS

 Easy handling of missing data (represented as NaN) in both floating point


and non-floating-point data
 Size mutability: columns can be inserted and deleted from DataFrames and
higher-dimensional objects
 Automatic and explicit data alignment: objects can be explicitly aligned to a
set of labels, or the user can simply ignore the labels and
let series, DataFrame, etc. automatically align the data in computations
 Powerful, flexible group-by functionality to perform split-apply-combine
operations on data sets for both aggregating and transforming data
 Making it easy to convert ragged, differently indexed data in other Python
and Numpy data structures into DataFrame objects
 Intelligent label-based slicing, fancy indexing, and subsetting of large data
sets

5.5.2 PIL

The Python Imaging Library (PIL) is a free, open-source library that allows
users to open, manipulate, and save images in various formats. PIL is not part of
Python's standard library, but it provides a wide range of capabilities for image
processing tasks, including: Opening and saving image files, manipulating image
data, applying image filters and transformations, Resizing, and Cropping. PIL
supports many different image file formats, including popular formats such as
JPEG. It has been built and tested with Python 2.0 and newer, on Windows, Mac
OS X, and major Unix platforms. To install PIL, you can use the Python package
manager called pip. Open your command prompt or terminal and run the

16
command: `pip install pillow`. It provides a wide range of features for image
processing tasks such as resizing, cropping, applying filters, and more. It’s a
powerful library in python that allows users to open, manipulate, and save various
image formats. Pillow was announced as a replacement for PIL for future usage.
Pillow supports a large number of image file formats including BMP, PNG,
JPEG, and TIFF. The library encourages adding support for newer formats in the
library by creating new file.
FEATURES OF PILL
 Development server and debugger.
 Integrated support for unit testing.
 RESTful request dispatching.
 Uses Jinja templating.
 Support for secure cookies (client-side sessions).
 Unicode-based.
 Complete documentation.
5.5.3 PYTORCH

PyTorch is an open-source machine learning framework based on the Torch


library, used for applications such as computer vision and natural language
processing, originally developed by Meta AI and now part of the Linux
Foundation umbrella. It is recognized as one of the two most popular machine
learning libraries alongside TensorFlow, offering free and open-source software
released under the modified BSD license. PyTorch is written in Python and is
compatible with popular Python libraries like NumPy, SciPy, and Cython. It is
used in applications like: Image recognition, Language processing, Reinforcement
learning, and Natural language classification.
PyTorch is known for its flexibility and ease of use. It uses dynamic computation
graphs, which are more flexible than static graphs. Dynamic graphs allow users to

17
interleave construction and valuation of the graph. PyTorch is also known for its
speed. It uses the Torch library's GPU support to accelerate training and inference.

FEATURES OF PYTORCH

 It is easy to debug and understand the code.


 It includes many layers as Torch.
 It includes lot of loss functions.
 It can be considered as NumPy extension to GPUs.
 It allows building networks whose structure is dependent on computation
itself.

5.5.4 FLASK

Flask is a lightweight, open-source Python framework for building web


applications. It's a popular alternative to Django, which is a more complex full-
stack development framework. Flask is known for its simplicity and flexibility,
which makes it suitable for both small projects and larger applications. It's also
used in data science to create web interfaces, APIs, and visualization tools. Flask
is a microframework, which means it doesn't require particular tools or libraries. It
supports extensions that can add application features, such as: Object-relational
mappers, Form validation, Upload handling, Various open authentication
technologies, and Several common framework related tools. Flask is based on the
Werkzeg WSGI toolkit and the Jinja2 template engine, which are also Pocoo
projects.

FEATURES OF FLASK
 Flask is a lightweight backend framework with minimal dependencies.
 Flask is easy to learn because it’s simple and intuitive API makes it easy
to learn and use for beginners.

18
 Flask is a flexible Framework because it allows you to customize and
extend the framework to suit your needs easily.
 Flask can be used with any database like SQL and NoSQL and with any
Frontend Technology such as React or Angular.
5.5.5 NUMPY

NumPy is a Python library used for working with arrays. It's perfect for
scientific or mathematical calculations because it's fast and efficient. In addition,
NumPy includes support for signal processing and linear algebra operations.
NumPy arrays are stored at one continuous place in memory unlike lists, so
processes can access and manipulate them very efficiently. This behavior is called
locality of reference in computer science. This is the main reason why NumPy is
faster than lists. Also it is optimized to work with latest CPU architectures.
NumPy is a Python library and is written partially in Python, but most of the parts
that require fast computation are written in C or C++. NumPy arrays provide an
efficient way of storing and manipulating data. NumPy also includes a number of
functions that make it easy to perform mathematical operations on arrays. This
can be really useful for scientific or engineering applications. And if you’re
working with data from a Python script, using NumPy can make your life a lot
easier. The syntax of NumPy functions generally involves calling the function and
passing one or more parameters, such as the shape of the array, the range of values
to generate, or the type of data to use.

FEATURES OF NUMPY

 Python NumPy supports arrays with different numbers of dimensions.

 It supports broadcasting, a powerful and robust method for executing


operations to arrays of various shapes and sizes.

19
 NumPy functions well with other Python scientific computing libraries like
SciPy, Matplotlib, and Pandas.

 NumPy offers a variety of math operations, such as linear algebra functions,


trigonometric functions, and fundamental arithmetic operations.

5.5.6 CLICK

Click is a Python package for creating beautiful command line interfaces


in a composable way with as little code as necessary. It’s the “Command Line
Interface Creation Kit”. It’s highly configurable but comes with sensible defaults
out of the box. It aims to make the process of writing command line tools quick
and fun while also preventing any frustration caused by the inability to implement
an intended CLI API. At its simplest, just decorating a function with this
decorator will make it into a callable script: import click @click. It remembers
parsed parameters, what command created it, which resources need to be cleaned
up at the end of the function, and so forth. It can also optionally hold an
application-defined object. The most important and final step towards making the
perfect CLI is to provide documentation to our code. Click provides a nice and
formatted help text in the command line when used the optional argument help.
It uses the docstring specified in the function.

FEATURES OF CLICK

 This can save you a lot of time and effort, and it ensures that your help
pages are always up-to-date.
 Click supports nested commands, which allows you to create complex
CLIs with a hierarchical structure
 Click allows you to pass context objects between commands.
 This can help you to prevent errors and make your CLI more user-friendly.

20
 Click provides a variety of options that you can use to customize your
CLI.
 This includes options for specifying input and output files, setting
verbosity levels, and enabling or disabling features
5.5.7 TORCH VISION
The torch vision library is an essential component of the PyTorch ecosystem,
designed to facilitate computer vision tasks with ease and efficiency. Torch
vision offers a rich set of image transformations, such as resizing, cropping,
normalization, and data augmentation. These transformations are crucial for
preprocessing and augmenting image datasets, enhancing the model's
performance and generalization capability. Another significant aspect of torch
vision is its collection of pre-trained models, including renowned architectures
like ResNet, VGG, and AlexNet. These pre-trained models can be effortlessly
loaded and fine-tuned for specific tasks, eliminating the need to train models
from scratch and accelerating the development process. The library also provides
utilities for loading and processing images, simplifying the preparation of image
data for training and evaluation in PyTorch. These implementations streamline
the development of complex computer vision applications, enabling developers
to focus on designing innovative solutions rather than grappling with the
intricacies of model implementation. Offers the necessary tools and resources to
build, evaluate, and deploy robust computer vision models efficiently.

FEATURES OF TORCH VISION

 Includes pre-trained models such as ResNet, VGG, and AlexNet that can be
loaded and fine-tuned for specific tasks, accelerating model development.
 Offers utilities for loading and processing images, simplifying the integra-
tion of image data into PyTorch workflows.
 Supports GPU acceleration for faster training and inference, enhancing per-
formance and scalability.
21
CHAPTER 6

SYSTEM DESIGN

6.1 GENERAL

Designing of systems is the process in which it is used to define the interface,


modules and data for a system to a specified the demand to satisfy. System design
is seen as the application of the system theory. The main thing of the design in a
system is to develop the system architecture by giving the data and information
that is necessary for the implementation of a system.

6.2 SYSTEM ARCHITECTURE

22
6.3 USECASE DIAGRAM

A use case diagram at its simplest is a representation of a user’s interaction


with the system that shows the relationship between the user and the different use
case in which the user is involved. A use case diagram can identify the different
types of users of a system and the different use cases and will often be
accompanied by other types of diagrams as well.

Sign up

Login

Import Input
Data

Make prediction

Export
output

Logout

User Administrator

Fig 6.2 Use case Diagram

23
6.4 CLASS DIAGRAM

A class diagram in the is a type of static structure diagram that describes the
structure of a system by showing the system's classes, their attributes, operations
or methods, and the relationships among objects. The class diagram is the main
building block of object-oriented modelling. It is used for general conceptual
modelling of the structure of the application, and for detailed modelling
translating the models into programming code. Class diagrams can also be used
for data modelling.

Input Output
Dataset Acquisition Features extraction
Input image Dataset Classification

Preprocessing ( ) Finally get Classified &


Display Result ( ):Plant
Disease classification

Fig 6.3 Class Diagram

24
6.5 SEQUENCE DIAGRAM

A Sequence diagram in the is a type of static structure diagram that


describes the structure of a system by showing the system's classes, their
attributes, operations or methods, and the relationships among objects. The
class diagram is the main building block of object-oriented modelling. It is used
for general conceptual modelling of the structure of the application, and for
detailed modelling translating the models into programming code.

Image Dataset Training Testing


Input image dataset

Send the data to training stage

Pre-processing

Train the image

Extracted feature and send to


testing stage Give input test image

Predict the Plant Disease type


using proposed algorithm

Fig 6.4 Sequence Diagram

25
6.6 DATA FLOW DIAGRAM

A data-flow diagram (DFD) is a way of representing a flow of a data of a


process or a system (usually an information system). The DFD also provides
information about the outputs and inputs of each entity and the process itself. A
data-flow diagram has no control flow, there are no decision rules and no loops.
A data-flow diagram has no control flow, there are no decision rules and no
loops.

Input Dataset

Pre processing

Training dataset

Model:Regression technique

Prediction / Classification Testing Data

:
Predicted

Fig 6.5 Data flow diagram

26
6.7 ACTIVITY DIAGRAM
Activity diagrams are graphical representations of workflows of
stepwise activities and actions with support for choice, iteration and
concurrency. In the Unified Modeling Language, activity diagrams can be used
to describe the business and operational step-by-step workflows of components
in a system. An activity diagram shows the overall flow of control.

Input Dataset

Preprocessing

Training

Model: InceptionV3

Predicted Results: Plant


Disease classification

Fig 6.6 Activity Diagram

27
6.8 FLOWCHART

A Flow chart is a way of representing a flow of a data of a process or a


system (usually an information system). These also provides information about
the outputs and inputs of each entity and the process itself.

Gather data from various sources such as student records, demographic


information, test scores, attendance records, etc. Clean the data to handle missing
values, outliers, and inconsistencies. Normalize or standardize numerical features.
Encode categorical variables.

Optimize model hyperparameters using techniques like grid search or


randomized search. Compare the performance of different models to select the
best-performing one. Use the trained model to make predictions on the testing
dataset. Compare predicted values with actual values from the testing dataset.
Calculate evaluation metrics to assess the accuracy and effectiveness of the
model's predictions.

Once satisfied with the model's performance, deploy it for practical use.
Integrate the model into an application or system where it can be used to predict
student performance. Continuously monitor the model's performance in real-world
applications. Collect feedback and update the model periodically to ensure its
accuracy and relevance over time.

Pre- Model
Plant processing InceptionV3 Predicted
Loss
Image and Feature Architecture. Results:Plant
&
Dataset Selection Disease
Model
classification
Accurac
y

28
Fig 6.6 Flow chart
CHAPTER 7

MODULES
7.1 LIST OF MODULES
 Data Collection and Pre-Processing Module
 Model Creation Module
 Training and Evaluation Module
 Deployment Module
7.2 MODULES DESCRIPTION
7.2.1 Data Collection and Pre-Processing Module
The data collection and preprocessing module in leaf disease identification
involves gathering a diverse dataset of plant leaf images this undergo
preprocessing to enhance quality, feature extraction for relevant information. The
data collection process involves gathering high-quality images of both diseased
and healthy leaves, which can be sourced from field surveys, online databases, or
experimental setups. It's essential to use high-resolution cameras or specialized
imaging devices to ensure the quality of the captured images. Consistent lighting
conditions and backgrounds should be maintained to minimize variability across
the dataset. Once the images are collected, they need to be annotated to indicate
whether they represent diseased or healthy leaves. This annotated dataset serves as
the foundation for training and validating the machine learning models used in the
identification system.
7.2.2 Model Creation Module

The Model Creation Module in leaf disease identification involves selecting


suitable machine learning algorithms, designing model architecture, and preparing
training data. It includes training the model to recognize disease patterns. The first
critical step involves selecting an appropriate machine learning or deep learning
model architecture suited for image classification tasks. Convolutional Neural
29
Networks (CNNs) are popular choices due to their ability to capture spatial
hierarchies of features, making them highly effective for analyzing image data.
Once the model type is chosen, the architecture is meticulously designed,
determining the number and types of layers, including convolutional, pooling, and
fully connected layers, along with the activation functions to be used. The
architecture is tailored to extract relevant features from the pre-processed images
that are indicative of specific leaf diseases, optimizing the model's performance
for disease classification.

7.2.3 Training and Evaluation Module

The Training and Evaluation Module in leaf disease identification involves


training machine learning models on labeled data, typically using a subset for
validation. After training, the model is evaluated on a separate test set to assess its
accuracy and generalization ability. This module initiates with the preparation of
the pre-processed dataset, which is partitioned into three distinct subsets: training,
validation, and test sets. The training set is utilized to train the model, enabling the
machine learning algorithm to learn the intricate patterns and features inherent in
the labeled images. Periodic updates with new data ensure that the model remains
accurate, reliable, and adaptive to emerging disease patterns and environmental
factors, thereby maintaining its efficacy in real-world applications and
contributing to sustainable agriculture practices by enabling timely and accurate
disease detection and management.

7.2.4 Deployment Modules

The Deployment Module for leaf disease identification employs Flask to


create a user-friendly web application. It includes a visually appealing interface
for image upload and integration. The Deployment module represents the final
and crucial stage where the trained and validated machine learning model is
integrated into real-world applications to enable automated and precise leaf

30
disease detection. This module comprises several essential components aimed at
ensuring seamless integration, efficient operation, and optimal performance of the
model in practical settings. The trained machine learning model is integrated into
the Flask application. This involves loading the model along with any
preprocessing or postprocessing code necessary for making predictions on
incoming leaf images. The trained machine learning model is integrated into the
Flask application. This involves loading the model along with any preprocessing
or postprocessing code necessary for making predictions on incoming leaf images.

31
CHAPTER 8
SYSTEM TESTING
8.1 GENERAL
The purpose of testing is to discover errors. Testing is the process of trying
to discover every conceivable fault or weakness in a work product. It provides a
way to check the functionality of components, sub-assemblies, assemblies and/or
a finished product It is the process of exercising software with the intent of
ensuring that the Software system meets its requirements and user expectations
and does not fail in an unacceptable manner. There are various types of test. Each
test type addresses a specific testing requirement.

8.2 TYPES OF TESTING


UNIT TESTING
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is the
testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing,
that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly
defined inputs and expected results.
FUNCTIONAL TESTING
Functional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system
documentation, and user manuals.

Functional testing is centered on the following items:

32
Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs.

Systems/Procedures : interfacing systems or procedures must be invoked.

INTEGRATION TESTING
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and is more
concerned with the basic outcome of screens or fields. Integration tests
demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct and
consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.

PERFORMANCE TESTING

The Performance test ensures that the output be produced within the time
limits, and the time taken by the system for compiling, giving response to the
users and request being send to the system for to retrieve the results.

SYSTEM TESTING

System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An
example of system testing is the configuration-oriented system integration test.
System testing is based on process descriptions and flows, emphasizing pre-driven
process links and integration points.

33
VALIDATION TESTING

The final step involves Validation testing, which determines whether the
software function as the user expected. The end-user rather than the system
developer conduct this test most software developers as a process called Alpha
and Beta Testing‖ to uncover that only the end user seems able to find. The
compilation of the entire project is based on the full satisfaction of the end users.
In the project, validation testing is made in various forms. In question entry form,
the correct answer only will be accepted in the answer box. The answers other
than the four given choices will not be accepted.

ACCEPTANCE TESTING

Normally this type of testing is done to verify if system meets the customer
specified requirements. User or customer does this testing to determine whether to
accept application. User Acceptance Testing is a critical phase of any project and
requires significant participation by the end user. It also ensures that the system
meets the functional requirements.

Acceptance testing for Data Synchronization:

 The Acknowledgements will be received by the Sender Node after the


Packets are received by the Destination Node.
 The Route add operation is done only when there is a Route request in need.
 The Status of Nodes information is done automatically in the Cache
Updating process.

USABILITY TESTING

User-friendly. Application flow is tested, Can new user understand the


application easily, Proper help documented whenever user stuck at any point.
Basically, system navigation is checked in this testing.

34
WHITEBOX TESTING

White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at
least its purpose. It is purpose. It is used to test areas that cannot be reached from
a black box level.

BLACKBOX TESTING

Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as
most other kinds of tests, must be written from a definitive source document, such
as specification or requirements document, such as specification or requirements
document. It is a testing in which the software under test is treated, as a black box
you cannot “see” into it. The test provides inputs and responds to outputs without
considering how the software works.

35
CHAPTER 9

IMPLEMENTATION
Implementing a leaf disease identification system using machine learning is
a comprehensive process that requires meticulous planning and execution. The
journey begins with data collection, where you must assemble a diverse and
comprehensive dataset. This dataset should include images of both healthy and
diseased leaves from various plant species. Each image should be meticulously
labeled to indicate the type of disease it represents or labeled as a healthy leaf.
This labeling process is crucial as it forms the foundation upon which the machine
learning model will be trained to recognize and differentiate between healthy and
diseased leaves.
After assembling the dataset, the next phase is data preprocessing, which is
pivotal for ensuring the quality and relevance of the data. During this phase, you'll
need to resize the images to a uniform dimension to maintain consistency across
the dataset. Data augmentation techniques can also be applied to artificially
increase the dataset's size and diversity. Techniques such as rotation, flipping, and
zooming can be employed to introduce variations in the dataset, which helps the
model generalize better to unseen data. Additionally, normalizing the pixel values
of the images to a standardized range, typically between 0 and 1, can significantly
improve the model's training efficiency and convergence.
Once the data is preprocessed and ready, the subsequent step involves feature
extraction. In this phase, the goal is to identify and extract meaningful patterns or
features from the images that can be used for classification. Various techniques
can be employed for feature extraction, ranging from traditional methods like
Histogram of Oriented Gradients (HOG) and color histograms to more advanced
approaches using deep learning features. The choice of feature extraction method
largely depends on the nature of the dataset and the complexity of the diseases
you're trying to identify.

36
With the extracted features in hand, the focus shifts to model selection and
training. Convolutional Neural Networks (CNNs) have emerged as a popular
choice for image classification tasks due to their ability to automatically learn
hierarchical features from the data. However, other machine learning algorithms
like Support Vector Machines (SVMs) or Random Forests can also be effective
depending on the dataset's characteristics. Once the model is selected, it needs to
be trained using the preprocessed data. The dataset is typically divided into
training, validation, and test sets to facilitate model training, evaluation, and
performance tuning. During training, the model's hyperparameters can be
optimized using techniques like grid search or random search to enhance its
performance.
After the model has been trained and fine-tuned, the next step is to evaluate
its performance rigorously. This involves testing the model on the validation and
test datasets using various performance metrics such as accuracy, precision, recall,
and F1-score. These metrics provide valuable insights into the model's strengths
and weaknesses, helping you assess its ability to generalize to new, unseen data
accurately.
Once the model's performance meets the desired criteria, it can be deployed
into a production environment. This deployment phase involves integrating the
trained model into a user-friendly interface, such as a web application or mobile
app. Users can then upload images of leaves to the interface, and the model will
provide predictions regarding the presence of diseases in the leaves. Post-
deployment, it's crucial to establish a robust monitoring system to continuously
track the model's performance and make necessary updates or improvements as
new data becomes available or as the model encounters new challenges.

37
CHAPTER 10
CONCLUSION AND FUTURE ENHANCEMENT

9.1 CONCLUSION
In this study, a paradigm for evaluating bacterial curse, impact, sheath
deterioration, and black collared spot diseases was developed. To set up the
grouping calculation, image handling techniques like division, feature extraction,
and two classifiers were used. Color and area-aware shape features have been
extracted from the element extraction process and used as a contribution to the
classifier. Although the aforementioned analysis has shown the general nature of
the anticipated communications. CNN outperforms all currently used machine
learning techniques as well as conventional methods for detecting plant ailments.
We have also developed an CNN-based web server for impact expectation, a first
of its type globally, to support the plant scientific network and agriculturalists
with their fundamental leadership processes. It is discovered that the new method
is superior in terms of time unpredictability, precision, and the number of diseases
secured. By leveraging advanced algorithms and computer vision techniques,
machine learning models enable early detection of leaf diseases, facilitating timely
interventions and reducing crop losses. The high accuracy, reliability, and
adaptability of these models empower farmers with actionable insights, enabling
informed decision-making and optimizing resource allocation. Furthermore, the
integration of machine learning with Internet of Things (IoT) and edge computing
technologies enables real-time monitoring and management of plant health,
fostering sustainable agricultural practices and enhancing crop yield and quality.
As research and innovation continue to advance in this area, machine learning
holds the promise to further enhance the resilience, productivity, and sustainability
of agriculture, contributing to global food security and environmental
conservation.

38
9.2 FUTURE WORK

The extraction of color and form features has been done in this work.
Moreover, texture features can be added and their effects on an algorithm's
performance can be evaluated. More illnesses exist in addition to the four
covered in this work. The focus of future effort may be on more disorders.
Other crops can use the same methods with only minor adjustments. The job
was completed for one person, but it will be applied to all crops going forward
using the same methodology and will undoubtedly provide positive outcomes.
While considering other crops, look just at the disease signs. Furthermore, the
incorporation of explainable AI techniques could foster trust and adoption
among users by enhancing model interpretability. Developing transparent and
interpretable machine learning models that provide insights into the decision-
making process can enable users to understand and validate the model's
predictions effectively, fostering confidence in its capabilities. Optimizing
model scalability and deployment, particularly on edge devices with limited
computational resources, is another area ripe for exploration. Efforts to develop
lightweight models and efficient algorithms tailored for edge computing
environments could facilitate broader adoption and implementation of machine
learning-based leaf disease identification systems in diverse agricultural
settings. By addressing these areas of focus through interdisciplinary
collaboration, research, and innovation, we can unlock new opportunities to
improve crop health, productivity, and resilience, ultimately contributing to
global food security, environmental sustainability, and economic prosperity.

39
APPENDIX-1
SOURCE CODE

40

You might also like