Project Report
Project Report
Project Report
on
Tomato Leaf Disease Detection System
Based on Convolution Neural Network
Bachelor of Technology
in
Computer Science and Engineering
by
Akshay Srivastava (1809710012)
Anmol Agrawal (1809710018)
Anurag Verma (1809710020)
June, 2021
1
GALGOTIAS COLLEGE OF ENGINEERING & TECHNOLOGY
GREATER NOIDA, UTTAR PRADESH, INDIA- 201306.
CERTIFICATE
This is to certify that the project report entitled “Tomato Leaf Disease Detection System Based on
Convolution Network ” submitted by Mr. Akshay Srivastava 1809710012 , Mr. Anmol Agrawal
1809710018, Mr. Anurag Verma 1809710020 to the Galgotias College of Engineering & Technology,
Greater Noida, Uttar Pradesh, affiliated to Dr. A.P.J. Abdul Kalam Technical University Lucknow, Uttar
Pradesh in partial fulfillment for the award of Degree of Bachelor of Technology in Computer
science & Engineering is a bonafide record of the project work carried out by them under my supervision
during the year 2020-2021.
2
GALGOTIAS COLLEGE OF ENGINEERING & TECHNOLOGY
GREATER NOIDA, UTTAR PRADESH, INDIA- 201306.
ACKNOWLEDGEMENT
We have taken efforts in this project. However, it would not have been possible without the kind support and
help of many individuals and organizations. We would like to extend my sincere thanks to all of them.
We are highly indebted to Mr. Dinesh Babu for his guidance and constant supervision. Also, we are highly
thankful to them for providing necessary information regarding the project & also for their support in
completing the project.
We are extremely indebted to Dr. Vishnu Sharma, HOD, Department of Computer Science and Engineering,
GCET and Dr. Jaya Sinha, Project Coordinator, Department of Computer Science and Engineering, GCET
for their valuable suggestions and constant support throughout my project tenure. We would also like to
express our sincere thanks to all faculty and staff members of the Department of Computer Science and
Engineering, GCET for their support in completing this project on time.
We also express gratitude towards our parents for their kind cooperation and encouragement which helped
me in completion of this project. Our thanks and appreciation also go to our friends in developing the project
and all the people who have willingly helped me out with their abilities.
Akshay Srivastava
Anmol Agrawal
Anurag Verma
3
ABSTRACT
Farmers who cultivate tomatoes face severe economic losses every year primarily due to various diseases
that infect tomato plants. This project aims for the accurate detection of these diseases at an early stage
which can reduce wastage and mitigate economic losses if timely and appropriate measures are taken after
the detection of disease. In order to solve such problems, methods based on machine learning and deep
learning can be used. Specifically, this project is focused on the use of deep learning algorithms based on
convolutional neural networks in order to build a classification model in order to accurately classify leaf
images and identify the disease. Data used in this project was obtained from Kaggle. This model has the task
to help with a classification problem that is detecting whether a leaf is healthy or unhealthy and if unhealthy
then what disease does it contain.
Every year, tomato farmers lose a significant amount of money owing to numerous illnesses that affect
tomato plants. This study intends to diagnose these illnesses accurately at an early stage, reducing waste and
mitigating economic losses if prompt and effective treatments are performed following disease
identification. Methods based on machine learning and deep learning can be utilised to tackle such
difficulties. This study focuses on the application of deep learning methods based on convolutional neural
networks to create a classification model that properly classifies leaf photos and diagnoses illness. Kaggle
provided the information for this project. This model's job is to assist with a classification problem:
determining if a leaf is healthy or sick, and if it is unhealthy, what disease it contains.
KEYWORDS: Deep learning; Convolutional neural network; Image classification; Machine learning;
Artificial Intelligence; Tomato disease classification
Table of contents
4
Title Page
CERTIFICATE i
ACKNOWLEDGEMENT ii
ABSTRACT iii
CONTENTS iv
LIST OF TABLES v
LIST OF FIGURES vi
NOMENCLATURE vii
ABBREVIATIONS viii
CHAPTER 1: INTRODUCTION 8
REFERENCE 29
LIST OF TABLES
5
Table Title Page
LIST OF FIGURES
ABBREVIATIONS
ML Machine Learning
HTML Hyper Text Markup Language
CSS Cascading Style Sheet
KNN K Nearest Neighbours
DT Decision Trees
SVM Support Vector Machines
CSV Comma Separated Values
CNN Convolutional Neural Network
BRBFNN Bacterial foraging optimization based Radial Basis Function Neural Network
TC Temporal Complexity
6
Chapter 1: Introduction
The diagnosis of tomato plant disease may begin with identifying the area of the plant that is infected, then
noting changes such as brown or black patches and holes on the plant, and finally looking for insects.
Tomatoes and comparable vegetables such as potatoes or brinjal must not be grown more than once in three
years on the same farm . To preserve soil fertility, any member of the grass family, such as wheat, corn, rice,
sugarcane, and so on, should be planted before tomatoes.
The difficulties with tomatoes may be classified into two categories: bacteria, fungus, or bad cultivation
habits cause 16 illnesses, and insects cause 5 additional diseases. Bacterial wilt is caused by the Ralstonia
solanacearum bacterium.This bacteria may live in the soil for a long time and penetrate roots through natural
wounds created during secondary root emergence, as well as man-made wounds created during cultivating or
transplanting, or even insects.Disease thrives in conditions where there is a lot of moisture and a lot of heat.
By quickly proliferating inside the plant's water conducting tissue, the bacterium fills it with slime. This has
an effect on the plant's vascular system, albeit the leaves may remain green. An infected plant stem appears
brown in the cross section with yellowish stuff pouring out of it.
We suggested an unique approach for identifying illness in tomato crops by evaluating photographs of leaves
in the study publication.
Farmers will be able to address their plant disease detection difficulties without having to chase down plant
scientists. It will therefore assist them in curing the plant's sickness in a timely manner, enhancing the quality
and quantity of food crops produced, as well as the farmer's profit.We obtained the tomato leaves dataset
from plantvillage for this experiment. We created a Convolution Neural Network model to categorise the
photos after downloading the dataset. With respect to a pre-trained model, the performance of the model was
evaluated using several factors such as training accuracy, validation accuracy, and testing accuracy, as well
as the number of trainable and number trainable parameters. The remainder of the paper is structured as
follows: Section 2 discusses the literature review of the available approaches. Section 3 contains a discussion
of the dataset. Section 4 presents the experimental setup and discussion of the results, followed by the
conclusion.
Tomatoes are a highly essential staple crop that is enjoyed by millions of people all over the world.
Tomatoes include the three most significant antioxidants: vitamin E, vitamin C, and beta-carotene. They're
also high in potassium, a mineral that's essential for overall health. It has a high market value and is mass-
produced in massive numbers. Unfortunately, plant diseases cause a large portion of the overall tomato crop
yield to be lost each year. Plant illnesses that are detected late or incorrectly enhance the crop's loss
percentage. It is critical to monitor the farmed crop's progress early on in order to ensure minimum losses.
The disease causes colour changes in the leaves, as well as spots, damage, and the loss of tomato fruit. Some
illnesses have symptoms that are not apparent to the naked eye, while others have signs that are evident but
difficult to interpret.Many farmers make incorrect conclusions about the illness because visually discernible
patterns in the leaf are difficult to recognise at a glance. As a result, farmers' preventative measures may be
ineffectual and, in some cases, detrimental. They don't have any professional advice on how to cope with
their crop infection since they don't have any. Over-dosing or under-dosing of the pesticide has resulted in
crop damage, resulting in increased output loss, in some cases due to a lack of understanding or
misperception about the severity of the disease. Because manual disease monitoring is difficult, automated
disease detection in tomato plants will aid farmers in disease identification and cost reduction. Disease
7
identification will aid in the prevention of losses and the creation of a high yield. As a result, an effective
method for detecting tomato disease in a timely manner is essential.
In today’s time, Deep Learning has become the most precise solution for disease detection in plants. The
infected leaves are collected and then labelled according to disease. The labelled images are further
converted to pixels for more information. Neural network models automatically classify images into classes
using automatic feature extraction. After feature extraction, the optimal subset of features is selected and
then one of the classification techniques is applied. Convolutional neural networks are one of the best
techniques for identification of objects in deep learning. A Convolutional Neural Network (CNN) is a Deep
Learning algorithm which can take in an input image, assign importance to various aspects/objects in the
image and be able to differentiate one from the other. It consists of an input layer, hidden layers and an
output layer. One of the attractive features of CNNs is that it can automatically extract features from images
for classification purposes through the learning process. CNN is used in many applications like object
detection, text detection from image, water leakage detection, biomedical image analysis, face image
detection, and achieved good performance.
8
It is important to recognize the previous research done in this field to be able to correctly advance in the
right direction. Plant leaf disease detection has been a major research area in which both image processing
and deep learning techniques have been widely used for its accurate classification.
The authors in [4] used Histogram of Oriented Gradient (HOG) operation and predicted features and
provided those points to the classification model. Finally, they tested the leaves and identified the sickness
and shifted those records to the farmer through a message. Later they took the leaves of the tomato crops
and picked out the disease with the aid of using SVM and ANN algorithms. The methodology they had
included the following stages: Data Collection, Preprocessing, Feature extraction, Image segmentation, and
classification phases. The dataset contained 200 tomato and maize leaf pictures. From that, 50 pictures
were healthy tomato and maize leaf pictures and 110 pictures were tomato and maize leaf pictures for the
coaching part and testing part. Similarly, 40 leaf pictures were used for the testing part. In Artificial Neural
Networks (ANN), they had input, hidden and output layers. The ANN normally consists of nodes, and an
arrow represents a connection from the output of one node to the other node. Each batch size was taken as
20. The initial gaining knowledge of rate had been set to 0.01 and it is decreased through a component of
0.3 on plateau the place the loss stops decreasing. Early stopping had additionally been used in order to
monitor the validation loss and give up the training procedure as soon as it increased. Here the tomato and
corn both crops were checked by using SVM and ANN classifiers.
The result for tomato crop by using SVM gave 60-70% and by using ANN it gave 80-85% accuracy. For
corn, by using SVM it gave 70- 75% and by using ANN it gave 55-65% accuracy. This model consumes
more time and is slow. Highest accuracy achieved is 85%.
In [5], The purpose of their work was to construct two models using deep convolutional neural networks
and object detection architectures to identify diseased tomatoes. Pictures from the internet were taken and
screened carefully to ensure correct correspondences between images and disease types. After careful
examination, images of tomato diseases were sorted out, including tomato malformed fruit, tomato blotchy
ripening, tomato puffy fruit, tomato dehiscent fruit, tomato blossom
end rot, tomato sunscald, tomato virus disease, tomato gray mold, tomato ulcer disease, and tomato
anthracnose. Including images of healthy tomatoes, with a total of 286, tomato images were used in the
experiments. The two different object detection architectures required the images in the training set to be
annotated in two different ways. The training set was used to train the model. The validation set was used to
give feedback about the progress of the training and determine if the training was complete. Finally, the
trained model was applied to the test set to evaluate its performance. In this work, tomato plant disease was
detected in the fruit stage. Detection of plant disease in the fruit stage results in later stage detection of the
disease during which treatment is difficult.
Segmentation of tomato leaf images based on adaptive clustering number of the K-means algorithm was
done in [6]. The whole experiment images were acquired from the tomato they grew. The white paper
background images were used for designing algorithms and the natural background images were the
algorithm validated data.
Through a series of pretreatment experiments, the value of the clustering number in this algorithm was
automatically determined by calculating the Davies Bouldin index, and the initial clustering center was
given to prevent the clustering calculation from falling into a local optimum. Finally, they verified the
accuracy of segmentation by two kinds of objective assessment measures, the clustering F1 measure and
Entropy. The number of cluster K values is required and usually determined by prior knowledge.
Sometimes it is difficult to find an appropriate number of clusters, even if it is the key for clustering the
9
optimal image segmentation results. The method failed to achieve high disease identification accuracy.
The objective of their work in research [1] was to classify the plant diseases by assessing the images of the
leaves with the application of Extreme Learning Machine (ELM), a Machine Learning classification
algorithm with a single layer feed-forward neural network. This work proposed image features as input
where the image was pre-processed via HSV colour space and features extraction via Haralick textures. The
features were then fitted in the ELM classifier to perform the model training and testing. The accuracy of
ELM was then calculated after the testing had been done. The dataset used consisted of tomato plant leaves
which is a subset of the Plant-Village dataset. The results produced from the ELM showed a better accuracy
that is 84.94% when compared to other models such as the Support Vector Machine and Decision Tree.
But the model takes a high amount of time in training and processing. High error rate in classifying lower
resolution images.
In [3], a comparative study was carried out on five types of machine learning classification techniques for
recognition of plant disease.
First stage of the plant disease detection system was image acquisition. High quality plant images were
acquired using digital cameras, scanners or drones. Knowledge based dataset was created for captured
images with different classes.
Acquired images was involved in pre processing stages to improve some image features important for
further processing. Segmentation process was used to partition the plant image in various segments. This
was used for the extraction of diseased areas in leaves. SVM classifier is used by many authors for
classification of diseases when compared with other classifiers. The result in this work shows that CNN
classifier detects a greater number of diseases with high accuracy. According to this work, in future, other
classification techniques in machine learning like decision trees, Naïve Bayes classifier may be used for
disease detection in plants and in the sense of helping farmers with automatic detection of all types of
diseases in crops to be detected. Classification was accurate only for identifying healthy and unhealthy
leaves. Further classification of unhealthy leaves was imprecise.
KFHT-RLPBC technique was used in [2], it included three processes such as pre-processing, feature
extraction and classification. A number of leaf images were gathered from the plant dataset. In the pre-
processing, the noises in the input leaf images were removed using Kuan filters to improve the image
quality for achieving the higher disease detection accuracy. The Hough transform was then utilized for
extracting shape, texture and color features. KFHT-RLPBC technique reduced the time complexity (TC) in
disease identification through the feature extraction process.
Finally, the classification was done by applying the reweighted linear program to boost classification
(RLPBC) to identify the disease at an earlier stage by constructing the number of weak learners. The
boosting classifier combined the weak learner results and made a strong one for achieving higher disease
detection accuracy with minimum error. With a plant village dataset, experimental evaluation was
performed using certain parameters namely peak signal to noise ratio, disease detection accuracy and time
complexity. Experimental results confirmed that KFHT-RLPBC technique enhanced peak signal to noise
ratio and reduced time complexity than the existing works. But, Accuracy achieved was only 88%.
In [7], they made use of Random Forest in identifying between healthy and diseased leaf from the data sets
created. Random forests are, as a whole, a learning method for classification, regression and other tasks
that operate by constructing a forest of the decision trees during the training time. Unlike decision trees,
Random forests overcame the disadvantage of over-fitting of their training data set and it handled both
10
numeric and categorical data. Their proposed paper included various phases of implementation namely
dataset creation, feature extraction, training the classifier and classification. The created datasets of
diseased and healthy leaves were collectively trained under Random Forest to classify the diseased and
healthy images. For extracting features of an image they used Histogram of an Oriented Gradient (HOG).
Overall, using machine learning to train the large data sets available publicly gave a clear way to detect the
disease present in plants on a colossal scale. An accuracy of only 70% was obtained using random forest.
Automatic segmentation of diseases from plant leaf images using a soft computing approach was done in
[8]. In their paper, they had introduced a method named as Bacterial foraging optimization based Radial
Basis Function Neural Network (BRBFNN) for identification and classification of plant leaf diseases
automatically. For assigning optimal weight to Radial Basis Function Neural Network (RBFNN) they used
Bacterial foraging optimization (BFO) that further increased the speed and accuracy of the network to
identify and classify the regions infected of different diseases on the plant leaves. The region growing
algorithm increased the efficiency of the network by searching and grouping seed points having common
attributes for the feature extraction process. They worked on fungal diseases like common rust, cedar apple
rust, late blight, leaf curl, leaf spot, and early blight. The proposed method attained higher accuracy in
identification and classification of diseases. But The noisy pixels were not removed using the filtering
technique. In [9], They trained a large, deep convolutional neural network to classify the 1.2 million high-
resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data,
they achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively. The neural network consisted
of five convolutional layers, some of which were followed by max-pooling layers, and three fully
connected layers with a final 1000-way softmax. They used non saturating neurons and a very efficient
GPU implementation of the convolution operation. They employed a regularization method called
“dropout”.
To simplify their experiments, they did not use any unsupervised pre training. They had made the network
larger and trained it longer. Huge amount of processing time was taken due to the high number of layers in
the neural network. It was not specialized to classify various diseases of crops.
In [10], the paper presented an algorithm for image segmentation technique which was used for automatic
detection and classification of plant leaf diseases. It also covered surveys on different disease classification
techniques that could be used for plant leaf disease detection. Image segmentation, which is an important
aspect for disease detection in plant leaf disease, was done by using genetic algorithms. According to this
work, Image segmentation is the process of separating or grouping an image into different parts. There are
currently many different ways of performing image segmentation, ranging from the simple thresholding
method to advanced color image segmentation methods. These parts normally correspond to something that
humans can easily separate and view as individual objects. Computers have no means of intelligently
recognizing objects, and so many different methods have been developed in order to segment images. The
segmentation process was based on various features found in the image. But the segmentation approach did
not use any classification technique to improve the recognition rate.
12
Minghe S, Jianhua Tomato Disease disease in fruit
Q, Jie X [2019] Types and results in later stage
Detection of detection of the
Infected Areas disease during
Based on Object which treatment is
Detection difficult.
Techniques
The leaves are a delicate part of the plant; the assessment of farm harvest is of dynamic importance. The
leaf’s texture and colour is the most significant visual property. Therefore it is important to identify leaf
13
diseases to assess farm products, enhance the market value and comply with quality standards. This is also
important to recognise and take additional measures to further spread diseases. It’s going to be too slow.
When physical methods are used to classify and categorize the method, we also need the professionals to
ensure it gets incorrect, and who’s less readily available.The works are categorized according to colour,
scale, etc. When such procedures are documented in an automated system by using the correct software
design language, the effort becomes faster and error-free. There are two main features to be achieved, speed
and precision in the methods of machine learning for plant disease detection. Technologies such as
automated detection and classification of plant diseases using floor image processing techniques are
required. It proves useful for farmers and warns them at the right time before the disease spreads over a wide
area.
● Existing Methods:-
One of the solutions was only able to classify healthy and unhealthy leaves. Further classification of
unhealthy leaves was not specific.
Image processing techniques are widely used in agriculture. Machine learning can be used to identify plant
leaf diseases to achieve smart farming and studies propose various techniques for plant disease recognition.
Most studies use a support vector machine (SVM) as the core of disease recognition to detect leaf disease.
● Problem Identification:-
All of the above-discussed research works have a major disadvantage that all research works use a very high
number of training parameters. Moreover, training a model with a very high number of training parameters
requires either a lot of training time or a machine with high computation power. This motivated us to work
towards reducing the number of training parameters used for plant disease detection without much decrease
in the classification accuracy. Hence in this paper, a novel hybrid model is proposed that reduces the
dimensionality of input leaf image using CAE before classifying it using CNN for plant disease detection.
The dimensionality reduction of plant leaf images before classification reduces the number of training
parameters by a significant factor, which is the major finding of this research work.
According to our literature review, we found out that there are many methods to detect Plant diseases.
Various researchers had suggested various techniques for detecting tomato leaf diseases in their work. But in
their solutions, some of them were not able to reach higher accuracy. Some others had proposed a plan to
detect diseases in the tomato plant’s fruit part rather than leaf which leads to late detection.
● Disadvantages:-
2. It requires either a lot of training time or a machine with high computation power.
14
Chapter 4: Proposed work
The main aim of the work is to detect diseases in tomato plants early and with higher accuracy. The disease
in tomato leaves can affect quality and quantity of production. In our proposed work, Convolution Neural
Network which is part of deep learning is used. The architectures of Convolution Neural Network are used
to classify 10 classes of tomato disease.
The proposed Tomato disease detection system consists of the deployment of a module that is used to
classify and categorize the plant diseases. It tests and researches the state of tomato disease and tries to
classify them using CNN Model.
In the CNN Model, The rescaling and data augmentation layers were added initially, and then the neural
network was formed. Then, for feature extraction, we add 6 convolutional and pooling layers alternately.
All previous research works use a very high number of training parameters and also require either a lot of
training time or a machine with high computation power. So we try to improve on this using our CNN
Model and try to improve the speed and accuracy of our model. We also try to reduce the dimensionality of
the input leaf image. We take input images using any image capturing device and try to provide its disease as
output with higher accuracy and precision.
For testing and training purposes, we collect our dataset from various sources and map dataset images to
their respective diseases after converting the images into arrays. We split our dataset into 70% for training
and 30% testing purposes, and try to run the best features on a higher number of epochs.
15
Fig 4.1
● Advantages:-
Chapter 5
System Design
The whole system will be detailed in this part, along with its operating concept, as well as the real-world and
experimental configuration.
For better organisation, the core system design and execution are organised into four key parts.
1. Data Collection
Kaggle.com was used to acquire the data for this project. It is made up of a tagged collection of numerous
tomato leaf photos. 16 Each plant is either healthy or ill in the image. The disease name is indicated in the
dataset if the plant is affected with it. The images are in RGB format and have a high resolution. Because the
majority of the photos in the dataset will be of the unhealthy class type during prediction, the dataset
contains more photographs of unhealthy plants than healthy plants.
2. Data Visualization
This figure shows the first 9 images of the dataset randomly
Fig 5.1
3. Data Preprocessing
The dataset was divided into 8:1:1 train, test, and validation sets. The neural network will be trained using
the training dataset. While tuning the model's hyperparameters, the validation dataset is utilised to provide
an estimate of model skill. After the training is complete, the test dataset is used to compute the model's final
accuracy and prediction.
The image's RGB values [0, 255] are rescaled to the [0,1] range. Images are also scaled to a standard size
(256 x 256). This promotes data homogeneity, allowing for improved accuracy without overfitting.
Data Augmentation is a strategy for increasing the variety of the training set by applying random (but
realistic) modifications to images, such as random rotations and flips, image contrast, image hue, and so on.
This makes the model even more reliable when it comes to generating predictions on a variety of photos.
Fig 5.2
The rescaling and data augmentation layers were added initially, and then the neural network was formed.
Then, for feature extraction, we add 6 convolutional and pooling layers alternately. The pooling layer
decreases the number of dimensions and the amount of time it takes to compute. The data is classified from
convolutional layers to the right class-name using two dense layers. During the training of our model, we
employed accuracy as a measure. To monitor the model's performance, the accuracy was determined using a
validation dataset during training.
18
Fig 5
References
1. Tan Soo Xian and Ruzelita Ngadiran 2021 J. Phys.: Conf. Ser. 1962 012024
19
2. Deepa, N.R., Nagarajan, N. Kuan noise filter with Hough transformation based reweighted linear program
boost classification for plant leaf disease detection. J Ambient Intell Human Comput 12, 5979– 5992 (2021).
4. N. Kanaka Durga and Ghanta Anuradha, "Plant Disease Identification Using SVM and ANN Algorithms,"
2019 International Journal of Recent Technology and Engineering (IJRTE) 23
5. Kai, Tian & Li, Jiuhao & Zeng, Jiefeng & Asenso, Evans & Zhang, Lina. (2019). Segmentation of tomato
leaf images based on adaptive clustering number of K-means algorithm. Computers and Electronics in
Agriculture. 165. 10.1016/j.compag.2019.104962.
6. Qimei W, Feng Q , 2 Minghe S, Jianhua Q, Jie X [2019]. Identification of Tomato Disease Types and
Detection of Infected Areas Based on Object Detection Techniques, Article ID 9142753.
7. S. Ramesh et al., "Plant Disease Detection Using Machine Learning," 2018 International Conference on
Design Innovations for 3Cs Compute Communicate
Control (ICDI3C), 2018, pp. 41-45, doi: 10.1109/ICDI3C.2018.00017.
8. S. S. Chouhan, A. Kaul, U. P. Singh and S. Jain, "Bacterial Foraging Optimization Based Radial Basis
Function Neural Network (BRBFNN) for Identification and Classification of Plant Leaf Diseases: An
Automatic Approach Towards Plant Pathology," in IEEE Access, vol. 6, pp. 8852-8863, 2018, doi:
10.1109/ACCESS.2018.2800685.
9. Krizhevsky, A., Sutskever, I., Hinton, G., 2017. ImageNet Classification with Deep Convolutional Neural
Networks.
10. Vijai Singh, A.K. Misra, Detection of plant leaf diseases using image segmentation and soft computing
techniques, Information Processing in Agriculture, Volume 4, Issue 1, 2017, Pages 41-49, ISSN 2214-3173.
20