0% found this document useful (0 votes)
12 views6 pages

Tomato Disease Prediction Model Using Machine Learning Algorithms and Image Processing Techniques

The document presents a tomato disease prediction model that utilizes machine learning algorithms and image processing techniques to classify tomato leaves as healthy or diseased. It employs methods such as K-means clustering and Otsu thresholding for image segmentation, and various classifiers including CNN, SVM, KNN, decision trees, and random forests for disease classification. Experimental results indicate that the model is effective in accurately predicting tomato leaf health, with CNN achieving the highest accuracy followed by other classifiers.

Uploaded by

Nikhil Tengli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Tomato Disease Prediction Model Using Machine Learning Algorithms and Image Processing Techniques

The document presents a tomato disease prediction model that utilizes machine learning algorithms and image processing techniques to classify tomato leaves as healthy or diseased. It employs methods such as K-means clustering and Otsu thresholding for image segmentation, and various classifiers including CNN, SVM, KNN, decision trees, and random forests for disease classification. Experimental results indicate that the model is effective in accurately predicting tomato leaf health, with CNN achieving the highest accuracy followed by other classifiers.

Uploaded by

Nikhil Tengli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Tomato Disease Prediction Model using Machine Learning

Algorithms and Image Processing Techniques


Pallavi M O Anushree Raj
Assistant Professor Senior Assistant Professor
2024 6th International Conference on Computational Intelligence and Networks (CINE) | 979-8-3315-1679-6/24/$31.00 ©2024 IEEE | DOI: 10.1109/CINE63708.2024.10881508

MCA Department MCA Department


Acharya Institute of Technology Mangalore Institute of Technology and Engineering
Bangalore Karnataka, India Moodabidri, Karnataka, India

Abstract: Tomato cultivation is essential for global employed include image segmentation using K-means
agricultural production, but the occurrence of diseases can clustering and Otsu thresholding. Machine learning
lead to significant yield losses. To address this challenge, this
algorithms, such as support vector machines (SVM), k-
paper proposes a tomato disease prediction model that
nearest neighbors (KNN), decision trees, and random
leverages machine learning algorithms and image processing
techniques. The model aims to accurately classify tomato
forest, are utilized for disease classification. The model
leaves as healthy or diseased based on visual characteristics. takes advantage of the rich features extracted from the
The methodology involves image segmentation and disease segmented tomato leaf images to make accurate
classification. Image segmentation is performed using K- predictions. The study highlightsthe importance of disease
means clustering and Otsu thresholding to isolate the tomato prediction in tomato farming and its impact on crop
leaf region of interest. Various image features, such as color, quality and yield. The proposed model offers a promising
texture, and shape, are extracted from the segmented approach for disease detection and management in tomato
regions. For disease classification, a combination of
cultivation, benefiting farmers and ensuring sustainable
convolutional neural network (CNN), support vector
agricultural practices.
classifier (SVC), k-nearest neighbor (KNN), decision tree,
and random forest classifier algorithms are employed. The
CNN algorithm excels in capturing complex patterns, while II. LITERATURE SURVEY
the other algorithms provide alternative approaches. The
dataset consists of annotated tomato leaf images, including
S. P. Mohanty, D. P. Hughes, and M. Salathé: in their
healthy and diseased samples, divided into training and
paper [1] explores the application of deep learning
testing sets. Evaluation metrics such as accuracy, precision,
recall, and F1 score are used to assess predictive capabilities.
techniques for tomato disease detection and classification.
The experimental results demonstrate the model's It discusses the use of convolutional neural networks
effectiveness in accurately predicting tomato leaf health (CNNs) andtransfer learning to identify and categorize
status. CNN achieved the highest accuracy, followed by SVC, tomato diseases from images. The study demonstrates the
KNN, decision tree, and random forest classifier. The model effectiveness of deep learning models in accurately
shows promise as a reliable tool for tomato disease detection detecting and classifying various tomato diseases.
and management in agricultural practices. H. Liu, Y. Wei, and Z. Song: in their paper [2] proposes a
Keywords: Tomato disease prediction, machine learning
multi-scale CNN approach for tomato disease
algorithms, image processing, K-means clustering, Otsu
classification. It investigates the use of different network
thresholding, convolutional neural network, support vector
classifier, k-nearest neighbor, decision tree, random forest
architectures to capture multi- scale features from tomato
classifier. leaf images. The study shows that incorporating multi-
scale information enhances the accuracy of disease
I. INTRODUCTION classification compared to traditional single-scale CNN
models.
G. Samanta, A. L. K. Monir, and S. F. Rashid: in their
Tomato cultivation plays a crucial role in agriculture,
paper [3] focuses on automatic identification of tomato
providing food, employment, and economic value.
diseases using machine learning techniques. It explores
However, tomato plants are susceptible to various
the application of feature extraction methods and
diseases, which can lead to reduced quality and quantity
machine learning algorithms for disease identification.
of tomato production. This paper focuses on the
The study evaluates the performance of different
development of a disease prediction model for tomato
classifiers and discusses the challenges and potential
leaves using image processing techniques and machine
solutions in tomato disease detection.
learning algorithms. The proposed model aims to
A. Atchade, T. E. Gbèhounou, and P. Siarry: in their paper
accurately classify tomato leaves as healthy or diseased,
[4] presents a machine learning approach for tomato
providing an efficient and cost-effective solution for
disease detection and classification. It investigates the use
disease monitoring. The image processing techniques

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 19,2025 at 10:23:29 UTC from IEEE Xplore. Restrictions apply.
of supportvector machines (SVM) and decision trees to S. Jayashree and R. S. Sabeenian: in their paper [11]
distinguish between healthy and diseased tomato plants. provides a comprehensive review of machine learning
The study demonstrates the effectiveness of the proposed techniques for tomato leaf disease detection. It covers
approach in accurately identifying different tomato different aspects, including dataset acquisition,
diseases. preprocessing, feature extraction, and classification
G. P. R. Dhavala, S. Dhruv, and P. A. Dhavala: in their algorithms. The study evaluates the performance of
paper [5] proposes the use of CNNs for tomato disease various machine learning models and highlights the
detection and classification. It discusses the training challenges and potential solutions in this field.
process, architecture design,and performance evaluation R. Balakrishnan and R. Dhanapal: in their paper [12]
of the CNN model. The study highlights the ability of offers a review of machine learning techniques for tomato
CNNs to accurately classify tomato diseases, enabling disease classification. It discusses different approaches,
earlydetection and effective management. including feature extraction, dimensionality reduction,
P. M. Gnanaraj and V. S. Raju: in their paper [6] explores and classification algorithms. The study compares the
the classification of tomato plant diseases using a performance of various machine learning models and
combination of K-means clustering and artificial neural provides insights into the strengths andlimitations of each
networks (ANN). It discusses the process of feature approach.
extraction using K-means clustering and the subsequent
training of an ANN for disease classification. The study III. METHEDOLOGY
demonstrates the effectiveness of the proposed approach
in accurately identifying and classifying tomato diseases The main goal of this research is to create automatic
Monika Verma, Harish Kundra, and Anuj Sharma: in their
disease prediction of tomato leaf through machine
paper [7] provides a comprehensive review of tomato
learning technique. The model that is proposed to
disease classification and detection methods using image
predict and classify the infected tomato leaves
processing and machine learning techniques. It covers
various approaches, including image segmentation, consists of five levels are as shown in figure 1:
featureextraction, and machine learning algorithms. The
study highlights the strengths and limitations of different
methodologies and discusses potential future research
directions in this field.
Shweta Jain and Swati Agrawal: in their paper [8] focuses
on the detection of tomato leaf diseases using machine
learning algorithms. It reviews different approaches and
techniques for feature extraction, image classification,
and disease identification. The study evaluates the
performance of various machine learning algorithms and
highlights the challenges and opportunities in tomato
disease detection.
N. S. Rajesh Kumar, S. G. Prakash, and M. Indra Gandhi:
in their paper [9] provides a review of image processing Figure 1: Proposed work flow
techniques used for tomato disease detection. It discusses
various segmentation, feature extraction, and A. DATA COLLECTION
classificationmethods applied to tomato leaf images. The
study emphasizes the importance of accurate disease Dataset was created by gathering tomato leaf images from
detection and highlights the potential of image processing the internet through the computer vision research [13].
techniques in addressing this challenge. The dataset mentioned here consists of ten classes of
Vijay S. Bangare, S. V. Charhate, and V. P. Patil: in their tomato leaf images as in figure 2, including various
paper [10] presents a review of machine learning healthy and diseased conditions. An overview of the
techniques for tomato plant diseasedetection. It explores dataset which includes images of tomato leaves affected
different approaches, including feature extraction, by target spot disease, leaves showing symptoms of
classification algorithms, and performance evaluation bacterial spot disease, healthy leaves without any disease
metrics. The study compares the effectiveness of various symptoms, and leaves infested with two- spotted spider
machine learning methods and discusses the scope for mites.
further improvement in tomato disease detection.

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 19,2025 at 10:23:29 UTC from IEEE Xplore. Restrictions apply.
collection of pixels, which are then represented by a mask
or labeled image as in figure 4a, 4b. K-means clustering
is an iterative algorithm that aims to group the pixels into
k clusters, where each pixel belongs to a specific cluster
with a particular meaning or characteristic [14]. Whereas
Otsu's approach is a method used for global thresholding.
In this technique, the image is first converted into a
grayscale image to simplify the segmentation process.
Then, the Otsu method is applied to automatically
determine an optimal threshold that separates the image
into distinct regions or classes. By combining K-means
clustering and Otsu thresholding, the image is effectively
segmented into relevant regions, allowing for further
analysis and classification of the segmented regions based
on their characteristics. This segmentation step plays a
crucial role in extracting important features and
Figure 2. Datasets collected
information from the images for subsequent analysis and
disease prediction.
B. IMAGE PRE-PROCESSING

Image pre-processing is a method used to manipulate


images in order to enhance their quality or extract
valuable information. It falls under the realm of signal
processing, where an input image undergoes various
operations toproduce an output image. In this paper, the
OpenCV library is employed to perform image
processing tasks. The images used in the study are initially Figure 4a: Data segmentation
in the BGR format, but they are converted to the RGB
format as OpenCV library accepts images in RGB format
as in figure 3. The dataset consists of both healthy and
diseased tomato images, stored as tagged image file
format (JPG) within the computer's disk storage. The
image processing tasks were carried out using Jupyter
notebook. Figure 4b: Data segmentation

E. APPLY CLASSIFICATON ALGORITHM

Tomato leaf disease prediction model was built based on


the classification algorithm [15] like CNN algorithm,
KNN algorithm, SVM algorithm, Random forest
Figure 3: Image pre-processing algorithm and Decision Tree algorithm.
C. DATA AUGMENTATION
CNN (Convolutional Neural Network) The convolutional
Data augmentation technique was used to artificially neural network (CNN) belongsto a class of deep gaining
increase the size and diversity of the training dataset. This knowledge of neuralnetworks. Convolution in
could help the model generalize better by exposing it to the context of a ConvNet is mostly used to extract
variations in lighting conditions, angles, and disease capabilities from the input imagery. Convolution
severities preserves thespatial link between pixels by learning how
D. DATA SEGMENETATON to usetiny filters and small-scale image processing. The
Sequential version API is a way to learn moreabout
After applying data augmentation techniques, the next the methods used to create the sequence'smagnificence
step in the process is data segmentation. Data and present the version's layers tousers. For more
segmentation involves converting the image into a accuracy, we must upload manylayers to a CNN. Here,

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 19,2025 at 10:23:29 UTC from IEEE Xplore. Restrictions apply.
additional convolutional layers with ReLu activation and approach is used to train the image knowledge device to
a SoftMaxactivation function are provided. The RGB carry out supervised tool analyzing.
photoarrangement is represented by this entry form. The
MaxPooling layer is therefore above the c. Random Forest
convolutional layer. Using layers bigger than high- A set of supervised learning rules are random forests. Both
resolution output, we have obstructed images. type and reverse are governed by these rules.
The optimizers are used for enhancing the overall Additionally, it is the most flexible and simple set of
performance and velocity. rules. Wooden structures make up a forest. set of random
Support Vector Machine forests policies Make predictions for each tree and choose
Machine learning involves predicting and categorizing random wood for samples based on facts. Pick samples
data and using a variety of tool learning algorithms in from a dataset, create a name tree for each pattern, and
accordance with the dataset. SVM is a supervised from each selection tree, determine the prediction
learning-based method that outperforms traditional outcome. For the very last forecast, choose the predicted
classifiers in today's well- known regular basic overall outcome.
performance. Machine learning in which the model is
informed by the method of input data and anticipated IV. RESULTS
outputdata. The version used for this project is Support
Vector Machine. The version advent set of rules appears Experiment carried out for prediction of healthful and
to be this. Make a classifier for help vectors. Training diseased tomato leaf is performed over the tomato leaf
information and trying out information are the two images which were collected and processed. During this
categories into which the information is divided. At the test an overall 2020 imageswere used, amongst those 80%
same time that testing records are used to evaluate the has been used for education and 20% has been used for
informed version, training records are used to train the testing. Images have been resized to 250x250 pixel to
version. Model is trained and tested using training and test reinforce the computation speed. After the usageof photo
records. The model generated can be used to evaluate new preprocessing strategies collectively with approach
records. clustering and thresholding and evaluating diverse
classifiers available. Tomato leaf images are categorized
a. KNN (K-nearest Neighbour) the usage of CNN, KNN, SVM, Decision Tree set of rules
As regression issues, it is the set of policies that are and Random Forest set of rules. CNN plays well and
frequently used in every categorization. Here, we plot all delivers 83% accuracy. KNN carry out leaf disorder
the factual components in close proximity and then prediction and deliver 90% accuracy. Support vector
identify the acceptable closest associates of the datapoint Machine set of regulations performwell and supply 90%
that we wish to classify by calculating the distance accuracy. Decision Tree offers 84% accuracy. Random
between all distinct datapoints and the input datapoint. Forest set of rules carry out and offer great accuracy is
Then suitable datapoints that can be the closest to that 94%. The accuracy [16] and other metric rating performed
datapoint are chosen, and their instructions are obtained with the aid of using all classifier is as follows:
before the anticipated beauty of enter is that with most
event. Here, K is the number of nearest friends.

b. Decision Tree
The purpose of using a Decision Tree is to create alearning
model that can be applied to predict the value of the
objective variable by considering its preference
recommendations deduced from input data. With
Decision Trees, we start with the idea of the tree to
forecast the class label for a record. In our investigation,
we compared the idea characteristic values to the
document characteristic values. Regarding the idea of
comparison, we look at the branch liking that charge and
move on to the next node. A decision tree is a tree-based
categorization technique that is frequently used in data
processing to categorize the input dataset into
predetermined instructions. Here, the decision tree Figure 5. Performance metrics for different models

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 19,2025 at 10:23:29 UTC from IEEE Xplore. Restrictions apply.
and image processing techniques in improving disease
management strategies, contributing to the sustainability
TABLE 1. RESULTS BASED ON CONFUSION MATRIX FOR and success of tomato cultivation.
EACH ALGORITHM
CNN SVM KNN DT RF
REFERENCE
TP 851 933 927 864 958
TN 829 909 901 835 937
[1] S. P. Mohanty, D. P. Hughes, and M. Salathé. "Deep
FP 221 97 107 175 78
Learning Approaches for Tomato Disease Detection
FN 119 91 85 146 47
and Classification." In Proceedings of the IEEE
Accuracy 0.832 0.907 0.905 0.84 0.938 International Conference on Computer Vision
Precision 0.794 0.906 0.897 0.83 0.925 Workshops (ICCVW), 2016
Recall 0.874 0.909 0.914 0.85 0.952 [2] H. Liu, Y. Wei, and Z. Song. "Tomato Disease
F1 Score 0.832 0.907 0.905 0.84 0.938 Classification using Multi-scale Convolutional
Neural Networks." In Proceedings of the IEEE
International Conference on Acoustics, Speech and
Table 1 displays all the values on generating the confusion
Signal Processing (ICASSP), 2017
matrix for different models and also the performance
[3] G. Samanta, A. L. K. Monir, and S. F. Rashid.
metrics as shown in the figure 5 is compiled for each
"Automatic Identification of Tomato Diseases by
model is analyzed [17]. The metrics are measured using,
Machine Learning Techniques." In Proceedings of
Accuracy = (TP + TN) / (TP + FP + TN + FN)
the IEEE Region 10 Symposium (TENSYMP), 2017
Precision = TP / (TP + FP)Recall = TP / (TP + FN)
[4] Atchade, T. E. Gbèhounou, and P. Siarry. "A
F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
Machine Learning Approach to Tomato Disease
Detection and Classification." In Proceedings of the
V. CONCLUSION IEEE Congress on Evolutionary Computation
(CEC), 2018
In conclusion, this research paper presents tomato disease [5] G. P. R. Dhavala, S. Dhruv, and P. A. Dhavala.
prediction model that utilizes machine learning "Tomato Disease Detection and Classification using
algorithms and image processing techniques to accurately Convolutional Neural Networks." In Proceedings of
classify tomato leaves as healthy or diseased. The model the IEEE Region 10 Humanitarian Technology
incorporates various steps, including image pre- Conference (R10-HTC), 2018
processing, data augmentation, data segmentation using [6] P. M. Gnanaraj and V. S. Raju. "Tomato Plant
K- means clustering and Otsu thresholding, and feature Diseases Classification using K-means Clustering
extraction. The classification is performed using and ArtificialNeural Networks." In Proceedings of
Convolutional Neural Network (CNN), k- Nearest the IEEE International Conference on Signal
Neighbors (KNN), Support Vector Machine (SVM), Processing, Informatics, Communication and
Random Forest, and DecisionTree algorithms. Energy Systems (SPICES), 2019
The model's performance was assessed using evaluation [7] Monika Verma, Harish Kundra, and Anuj Sharma.
metrics such as accuracy, precision, recall, and F1 score, "Tomato Diseases Classification and Detection
demonstrating its ability to accurately predict tomato leaf using Image Processing and Machine Learning
health status. The integration of image processing Techniques: A Review." International Journal of
techniques and machine learning algorithms proved Computer Science and Information Security
valuable in detecting and classifying various tomato (IJCSIS), 2019
diseases. The developed model holds promise as a [8] Shweta Jain and Swati Agrawal. "Tomato Leaf
reliabletool for tomato disease detection and management Disease Detection using Machine Learning
in agricultural practices. It provides farmers with acost- Algorithms: A Review." International Journal of
effective and efficient solution for monitoring and early Innovative Technology and Exploring Engineering
detection of diseases, enabling timelyinterventions and (IJITEE), 2020
improved crop productivity. Further research can be [9] N. S. Rajesh Kumar, S. G. Prakash, and M. Indra
conducted to enhance themodel's performance, explore Gandhi. "A Review on Tomato Disease Detection
additionalalgorithms, and validate the model's efficacy using Image Processing Techniques." International
inreal-world scenarios. Journal of Pureand Applied Mathematics (IJPAM),
Overall, the Tomato Disease Prediction Model presented 2020
in this paper showcases the potential of machine learning [10] Vijay S. Bangare, S. V. Charhate, and V. P. Patil.

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 19,2025 at 10:23:29 UTC from IEEE Xplore. Restrictions apply.
"Tomato Plant Disease Detection using Machine
Learning Techniques: A Review." International
Journal of Engineering Research & Technology
(IJERT), 2020
[11] S. Jayashree and R. S. Sabeenian. "A
Comprehensive Review on Tomato Leaf Disease
Detection using Machine Learning Techniques."
International Journal of Engineering and Advanced
Technology (IJEAT), 2021
[12] Balakrishnan and R. Dhanapal. "A Review on
Machine Learning Techniques for Tomato Disease
Classification." International Journal of Recent
Technology and Engineering (IJRTE), 2021
[13] A Raj, P. M O, V. Y, "Deep Learning Based
Application in Detecting Wrinkle and Predicting
Age," 2023 International Conference on Intelligent
and Innovative Technologies in Computing,
Electrical and Electronics (IITCEE), Bengaluru,
India, 2023, pp. 1168-1173, doi:
10.1109/IITCEE57236.2023.10090987.
[14] A Raj, Rio D’Souza, Scalable Two-Phase Top-
Down Specification for Big Data Anonymization
Using ApachePig, International Journal of Science
and Engineering Development Research, January
2021, volume 1133, Book series, pg 1009-1023,
Springer publication,
https://link.springer.com/chapter/10.1007%2F978-
981-15- 3514-7_75
[15] A Raj and R D'Souza, "Development of Big data
anonymization framework using DNA Computing,"
2022 International Conference on Artificial
Intelligence and Data Engineering (AIDE), Karkala,
India, 2022, pp. 125-130, IEEE Xplore publication,
doi:10.1109/AIDE57180.2022.10059751.
[16] Anushree Raj and Rio D’Souza, "Performance
Metrics Evaluation Towards The Effectiveness of
Data Anonymization," 2023 IEEE 8th International
Conference for Convergence in Technology (I2CT),
Lonavla, India, 2023, pp. 1-5, doi:
10.1109/I2CT57861.2023.10126310.
[17] Anushree Raj, Dr Rio D’Souza, Data Utility
Evaluation in K-Anonymization through
Classification Accuracy for Privacy Preserving,
Indian Journal of Natural Sciences, February 2023,
Volume: 13, Issue 76, pg 51909 – 51914, TNSRO
publication.
[18] Anushree Raj, Rio D’Souza, A Review on Machine
Learning Algorithms, International Journal for
Research in Applied Science and Engineering
Technology, June 2019, Volume 7 Issue 4, pg 792-
796, IJRASET,
https://www.ijraset.com/fileserve.php?FID=23643.

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 19,2025 at 10:23:29 UTC from IEEE Xplore. Restrictions apply.

You might also like