5.data-Driven Smart Farming To Grade and Classify Tomatoes Using
5.data-Driven Smart Farming To Grade and Classify Tomatoes Using
Volume 1, 2024
ABSTRACT
Identifying images poses a challenge in computer vision, but the use of deep learning methods has greatly
enhanced the performance of image classification systems. In this research, Convolutional Neural Networks
(CNN) and Feed Forward Neural Networks (FFNN) have been utilized for image classification. CNN is extremely
effective in picture classification, which extracts relevant information from images using convolutional and
pooling layers to minimize the dimensionality of the derived features, while FFNN algorithm is a classic neural
network with fully linked layers. It can be used to further process the features extracted by CNN. The study
makes use of CNN and FFNN models to train a huge dataset of tomato images to categorize them based on
their type, ripeness, and damage status. CNN is found to be more effective in the case of tomato classification
as compared to FFNN algorithm in all the use cases. The accuracy for classification of an image (tomato
or not) using CNN is 95.83%, type classification using CNN is 81.52%, whereas using FFNN is 66.30%;
ripeness grading for CNN is 92.86%, whereas for FFNN it is 57.14%; and damage status grading is 92.86%
using CNN and 67.86% using FFNN. Therefore, it can be concluded that quality processing of tomatoes can
be improved using CNN.
Keywords: Image classification, tomatoes, CNN, FFNN
There are more than 45+ types and varieties of of the minor distinctions between ripe and unripe
tomatoes. But this research substantially classifies tomatoes, even the Human Visual System has
them into three introductory types of tomatoes (by difficulty distinguishing between them. In order to
shape and size) – Classic Tomatoes (Regular- Sized effectively classify tomato ripeness, digital images
Tomatoes), Cherry Tomatoes (Mini Tomatoes), and must overcome nonlinear obstacles. The research
Beefsteak Tomatoes (Large Tomatoes). cited here proposes using a Support Vector Machine
(SVM) classifier to differentiate ripe and unripe
fruits, and a Multiclass Support Vector Machine
(MSVM) classifier to detect faults. Because of its
simple testing setup and dependability, the proposed
method is ideal for incorporation into the tomato
supply chain. Implementing this approach may
result in enhanced early differentiation of tomatoes
in the value chain, which will benefit the producers
(Kumar et al., 2020).
(Momeny et al., 2020). were utilized as inputs in the BPNN to ascertain the
maturity levels of the tomato samples (Wan et al.,
A research on Sun Bright tomatoes in 2015
2018).
investigated how ripeness affects the quality of
tomatoes for both processing and consumption. The After studying many relevant research papers and
purpose was to learn how the optical properties of articles, it was found that many algorithms, such
tomatoes, specifically their absorption and scattering as CNN, SVM, LDA, PCA, etc., were used in the
properties, altered as they ripened. 281 “Sun Bright” field of smart farming and image classification for
tomatoes at various stages of ripeness were studied classification of fruits such as banana, pear, tomato,
using hyperspectral imaging. The study’s goal was cherry, etc. Pre-trained CNN algorithms, such as
to create classification models for tomato maturity VGG16, VGG19, ResNet101, were also used in
based on optical absorption (μa) and scattering various researches. Of all the algorithms, CNN was
(μs’) spectra. The study attempted to categorize found to be a better choice for the research purpose
tomatoes into six or three maturity groups by using
Partial Least Squares-Discriminant Analysis (PLS-
Research methodology
DA) models utilizing these optical characteristics, The datasets used for the training are collected via
including solo and a combination (μa &μs’, eff) data GitHub along with real time images; for testing
(Zhu et al., 2015). purpose, real time images are used. The models
were trained on around 10,000 tomato images of
Another research in the early 2000s discussed the
different types, ripeness, and damage status.
use of color image processing to assess tomato
quality maturity. RGB and Lab* color schemes were
employed for the picture analysis. The findings were
as follows: The radical regression curve of G(36)
was judged to be 70% average correct, the pixels
count of G(36) showed the highest correlation
coefficient from tomato maturity, the level of a* also
rises in accordance with maturation while the b*
value did not change significantly, and the average
value of a* for the upper surface can be used for the Figure 3. Training images
maturity index (Gejima et al., 2004).
Artificial neural network (ANN)
Recent advances in computer vision have enabled
An artificial neuron network (also known as a neural
new agricultural applications, most notably accurate
network) is a computer model of how nerve cells
yield estimation for improved harvesting, marketing,
in the human brain work. Artificial neural networks
and logistics planning. A method for categorizing
(ANNs) use learning procedures to update their
fresh market tomatoes (Roma and Pear varieties)
answers on their own or to learn when fresh data is
based on their maturity levels (green, orange, and
presented to them. An artificial neural network has
red) was studied. Color features were combined
three or more linked layers. The top layer is made
with a backpropagation neural network (BPNN)
up of neurons that are used as input. These neurons
classification algorithm in this approach. To capture
send information to deeper layers, which send the
tomato images, a computer vision-based device
final output information to the final output layer
was developed, and image processing techniques
(Rouse, 2023). The numerical values that connect
were employed to isolate tomato targets. The area
the neurons are referred to as weight. The weights
for color feature extraction was determined as the
between neurons determine the neural network’s
largest inscribed circle on the tomato’s surface,
learning capacity. As artificial neural networks learn,
which was divided into five concentric circles for
the weights of the neurons change. Weights are
this purpose. The tomato’s maturity level was
assigned at random first. The “activation function” is
represented by the average hue values from each
used to standardize the output of neurons (Artificial
sub-region. Subsequently, these color characteristics
neural network - Applications, algorithms and whether or not the feature is there (Ratan, 2020).
examples, n.d.).
Pooling layer
Convolutional neural network
In convolutional neural networks, the feature map
A CNN is a deep learning neural network designed formed by a preceding convolutional layer and a
for processing structured arrays of data, such as non-linear activation function is often utilized as the
photos. The convolutional layer is a special type of basis for pooling. The essential phases of the pooling
layer that provides convolutional neural networks process are quite similar to those of the convolution
their strength. The design of a convolutional neural procedure. You select a filter and place it over the
network is a multi-layered feed-forward neural output feature map of the previous convolutional
network formed by progressively stacking many layer. Based on the type of pooling operation you
hidden layers on top of one another, allowing it to select, the pooling filter determines the output on the
acquire hierarchical features due to its sequential receptive field (the region of the feature map beneath
development (LeCun & Benaissa, n.d.). the filter). The most commonly used strategies are
max-pooling and average pooling (What is pooling
CNN has three main layers, namely, Convolutional
in a convolutional neural network (CNN): Pooling
layer, Pooling layer, and Fully-connected (FC) layer.
layers explained, 2021).
Convolutional layer
Fully connected layer
The convolutional layer is the central component of a
Neural networks are made up of a collection of
CNN, and it is also where majority of the processing
interdependent non-linear functions. Each function
occurs. The only components required are input data,
is carried out by a single neuron (or perceptron).
a filter, and a feature map. Assume that the input is
In fully connected layers, the neuron changes
a color image made up of a 3D pixel matrix. As a
the input vector linearly using a weights matrix.
result, the input will have three dimensions: height,
The result is then transformed nonlinearly using a
width, and depth, which correspond to the RGB
nonlinear activation function. Each input into the
values in a picture. In addition, we have a feature
input vector influences every output into the output
detector, also known as a kernel or filter, which
vector. However, not all weights have an effect on
will traverse the image’s receptive fields and assess
all outputs (Unzueta, 2022).
Figure 9. Confusion matrix for image classification Figure 10. Confusion matrix for type classification
(tomato or not) using CNN using CNN
Figure 11. Confusion matrix for type classification Figure 12. Confusion matrix for grading damage
using FFNN status using CNN
Figure 15. Accuracy curve for image classification Figure 16. Confusion matrix for grading ripeness
(tomato or not) using CNN using FFNN
Figure 17. Accuracy curve for type classification Figure 18. Accuracy curve for type classification
using CNN using FFNN
Figure 19. Accuracy curve for grading ripeness Figure 20. Accuracy curve for grading ripeness
using CNN using FFNN
Figure 21. Accuracy curve for grading damage Figure 22. Accuracy curve for grading damage
status using CNN status using FFNN