7.tomato Quality Classification Based On Transfer
7.tomato Quality Classification Based On Transfer
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2023.0322000
ABSTRACT The demand for high-quality tomatoes to meet consumer and market standards, combined with
large-scale production, has necessitated the development of an inline quality grading. Since manual grading is
time-consuming, costly, and requires a substantial amount of labor. This study introduces a novel approach
for tomato quality sorting and grading, focusing specifically on the color feature of tomato images. The
method leverages pre-trained convolutional neural networks (CNNs) for feature extraction and traditional
machine-learning algorithms for classification (hybrid model). The single-board computer NVIDIA Jetson
TX1 was used to create a tomato image dataset. Image preprocessing and fine-tuning techniques were applied
to enable deep layers to learn and concentrate on complex and significant features. The extracted features
were then classified using traditional machine learning algorithms namely: support vector machines (SVM),
random forest (RF), and k-nearest neighbors (KNN) classifiers. Among the proposed hybrid models, the
CNN-SVM method has outperformed other hybrid approaches, attaining an accuracy of 97.50% in the
binary classification of tomatoes as healthy or rejected and 96.67% in the multiclass classification of them
as ripe, unripe, or rejected when Inceptionv3 was used as feature extractor. Once another dataset (public
dataset) was used, the proposed hybrid model CNN-SVM achieved an accuracy of 97.54% in categorizing
tomatoes as ripe, unripe, old, or damaged outperforming other hybrid models when Inceptionv3 was used
as a feature extractor. The performance metrics accuracy, recall, precision, specificity, and F1-score of the
best-performing proposed hybrid model were evaluated.
INDEX TERMS Grading, feature extraction, machine learning algorithms, tomato, and image preprocessing.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
The study in [17], proposed a system that employs image accuracy of 81.6% in red fruit detection.
processing techniques for vegetable and fruit quality grad- The study in [15] proposed tomato ripeness detection and
ing and classification. The system classifies by extracting classification using VGG-based CNN models. The proposed
external features (color and texture). A comparative study model involved transfer learning and fine-tuning techniques
of two machine learning algorithms for vegetable and fruit of VGG-16 to classify tomatoes into ripe and unripe classes.
classification was presented in [12]. The SVM achieved a The average accuracy of 96% was attained by the model.
superior accuracy of 94.3% over the KNN. The study in The study in [25] introduced an automated grading system
[18] introduced a hybrid model for weed identification in designed for the assessment of tomato ripeness through the
winter rape fields. The study compared five models, the hy- application of deep learning methodologies. The investigation
brid model between VGGNet with SVM achieved a higher leveraged transfer learning with Resnet18, achieving an im-
accuracy of 92.1% than others. The study in [19] performed pressive average validation accuracy of 93.85% in the precise
similar work to [18] by adding the residual filter network for categorization of tomatoes into ripe, under-ripe, and over-
feature enhancement into the hybrid network (CNN-SVM), ripe.
with the proposed model attaining an overall accuracy of 99% In [26], the study suggested a system for classifying toma-
against others. toes into three categories based on maturity: immature, par-
Classification of appearance quality in red grapes using tially mature, and mature. The system was accomplished
transfer learning with convolutional neural networks was by applying deep transfer learning while utilizing five pre-
introduced in [20]. The investigation employed the transfer trained models. Among these models, VGG-19 demonstrated
learning technique with four pre-trained networks, namely the highest accuracy of 97.37% compared to the others. A
VGG19, Inceptionv3, GoogleNet, and ResNet50, to achieve tomato classification system based on size was presented
its objectives. Notably, ResNet50 demonstrated the highest in [27] with a system utilizing thresholding, machine learn-
accuracy, reaching 82.85% in the precise categorization of ing, and deep learning techniques based on area, perimeter,
red grapes into three distinct quality categories. To further and enclosed circle radius features. The machine learning
enhance performance, the research employed ResNet50 for techniques showed the best performance, with an average
feature extraction and SVM for classification. The ResNet50- accuracy of 94.5% achieved by the SVM. In [28], the study
SVM model excelled, achieving the highest accuracy of developed an automatic tomato detection method. The SVM
95.08% in effectively categorizing red grapes into the afore- algorithm was used as a classifier and achieved an average
mentioned categories. accuracy of 90%, outperforming others. A computer vision-
In a study conducted by [13] within the domain of fruit based grading and Sorting system of tomatoes into defected
quality assessment, a comparative analysis was undertaken and non-defected was presented in [29]. The task was per-
involving CNN and Vision Transformers (ViT). The findings formed using a backpropagation neural network algorithm,
of the study revealed that the CNN model demonstrated supe- attaining an accuracy of 92%.
rior performance compared to the ViT model. Specifically, the
CNN model achieved a remarkable accuracy of 95% in clas- III. MATERIALS AND METHODS
sifying apple fruits into four distinct classes and banana fruits In this section, we present the proposed framework for tomato
into two classes, surpassing the accuracy of 93% attained by classification, as shown in Figure 2. The proposed system in-
the ViT model. The study in [21] performed similar work cludes five major sections: image acquisition, preprocessing,
to [13] for olive disease classification, employing CNN and transfer learning, feature extraction, and classifier.
ViT for feature extraction purposes and utilizing SoftMax as a
classifier. The study achieved notable results by employing a A. IMAGE ACQUISITION
fusion of ViT and VGG-16 for feature extraction, attaining an The images were acquired in an uncontrolled lighting envi-
average accuracy of 97% for binary and multiclass classifica- ronment, as illustrated in Figure 4, which maintained a fixed
tion. However, according to [22], ViT requires a large training distance of 20 cm between the table surface and the camera.
dataset of more than 14 million images to outperform CNN The acquisition system employed a single-board computer
models like ResNet’s. NVIDIA Jetson Tx1 [30] with an onboard camera equipped
A system for tomato grading based on machine vision was with an RGB sensor capable of capturing images in JPEG
presented in [23]. The system attained an average accuracy format at 640 x 480 pixels resolution. The system operated at
of 95% for detecting calyx and stalk from the tomatoes and 30 frames per second, capturing tomato images one at a time.
an accuracy of 97% in classifying tomatoes into healthy and The sum of 600 tomatoes with different shapes, sizes, colors,
defective classes, with radial basis function-support vector and defects collected from the local market was washed and
machines (RBF-SVM) outperforming other classifiers. In this dried. The image acquisition involved capturing four images
study, it was observed that the training set was also used (four sides of the tomato i.e., top, bottom, rear, and front as
for testing. In [24], the study presented an advanced tomato shown in Figure 3) from each tomato, resulting in a total of
detection algorithm that utilizes enhanced hue saturation and 2400 images. The idea of capturing four images from each
value (HSV) color space and watershed techniques for ef- tomato is to mimic the concept of scanning a tomato rolling
ficient fruit separation. The algorithm achieved an overall in the rotating mechanical support like conveyor belts or
VOLUME 11, 2023 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
Preprocessing
Image
Acquisition
Image
Prepocessing
Processed
Images
Transfer Learning
Approach
Feature
Extraction
Classifiers Selector
Classifier
Perfomance
Evaluation
Categories
Classified
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
(a)
(b)
(c)
where, B(x, y), p(x, y), thr, x, y represent binary image, image D. TRANSFER LEARNING
pixel, threshold pixel value, and the coordinates of the pixel Transfer learning [35] addresses the challenges associated
respectively. with training deep learning networks when the amount of
The "OR" logical operator fused the generated masks, fol- available data is limited. Instead of starting from scratch,
lowed by morphological operations to remove gaps and holes transfer learning uses pre-trained deep learning networks cus-
to get the final mask that includes both color features. In addi- tomized for the specific task. The pre-trained network can be
tion, we performed background cancellation by multiplying utilized as a feature extractor or for end-to-end classification
the original image with its respective binary image (mask), tasks by carefully adjusting some parameters. This study
which helps identify specific ROI. Thus, the images are taken used a pre-trained network as a feature extractor and clas-
for training. Figure 5 demonstrates the preprocessing tech- sify the extracted features using traditional machine learning
niques investigated in our work for the case of the tomato classifiers. We evaluated four pre-trained networks, namely
image with a plain background and Figure 6 demonstrates MobileNetv2 [36], Inceptionv3 [37], ResNet50 [38], and
the capability of the preprocessing techniques to segment the AlexNet [39], for transfer learning in our proposed approach.
tomato image with a complex background of rollers similar We analyzed the performance of these networks in terms
to the rotating mechanical support in an industrial setup. of accuracy before deploying them for feature extraction.
We then proceed with fine-tuning the selected pre-trained
C. DATA AUGMENTATION networks. Fine-tuning aims to teach the network to recognize
Data augmentation is a machine learning and computer vision classes that were not trained before. It involves removing
technique to increase the training data available for a model to the last convolutional and classification layers of the adopted
learn from [34]. It involves applying various transformations pre-trained network to match the number of classes in the
to the existing data to create new, slightly altered versions building classification challenge. To prevent overfitting, We
of the original data. Data augmentation aims to improve the froze some network layers, apply batch normalization, and
generalization and robustness of machine learning models by dropout during training.
exposing them to a more extensive and diverse set of examples
during training. By introducing variations in the training data, E. FEATURE EXTRACTION
the model can learn to be more flexible and adaptable and han- In our proposed system, the selected pre-trained networks
dle new and unseen inputs during inference. We augmented were used to extract the deep features from the last convo-
our training set by applying rotation, reflection, translation, lutional layers, or dense layers of the network [40]. Usually,
and scaling. these networks are designed with more convolutional layers
VOLUME 11, 2023 5
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
to increase network performance. A set of weight layers dataset to a new data point. Then it assigns the majority class
cascade with another layer separated by the activation layer label among those KNN as the predicted label for the new
such as a rectified linear unit (ReLU). Thus, features were data point [42]. KNN works well with linear and non-linear
extracted from the deepest layer of the network and used to decision boundaries and can be applied to various problem
train traditional machine learning classifiers. domains such as image recognition, text classification, and
recommendation systems. However, KNN can be sensitive
F. CLASSIFIERS to the choice of K and the distance metric used and may be
Machine learning classifiers [41] are algorithms trained on computationally expensive for large datasets. The distance
labeled datasets to learn patterns and relationships between metric used to measure the similarity between data points
input features and corresponding output labels. The goal of a can vary depending on the problem domain. KNN may not
machine learning classifier is to predict the correct label for a handle large datasets such as image classification, so it is
new input based on the relationships learned from the training recommended to use principal component analysis [43] for
dataset. Machine learning classifiers include decision trees, feature dimensionality reduction before utilizing the KNN
random forests, logistic regression, support vector machines, classifier.
k-nearest neighbors, and neural networks. The choice of a
machine learning classifier depends on the specific charac- IV. RESULTS AND DISCUSSION
teristics of the dataset and the desired performance metrics. A. EXPERIMENTAL SETUP
A labeled dataset is typically divided into training and valida- The training and testing of the proposed model has been
tion sets to train a machine learning classifier. The training performed on 128GB of RAM, NVIDIA GeForce RTX 2080
set is used to train the classifier, and the validation set is Titan with 11GB of RAM, and an Intel (R) Xeon (R) Sil-
used to evaluate the classifier’s performance on new, unseen ver 4114 CPU 2.20 GHz PC using the MATLAB R2022b
data. The classifier’s performance is measured using accu- release. Our experiment deployed four pre-trained networks,
racy, precision, recall, specificity, and F1 score metrics. This namely MobileNetv2 [36], Inceptionv3 [37], ResNet50 [38],
study uses support vector machines, random forests, k-nearest and Alex-Net [39], for feature extractions. These features
neighbor’s classifiers, and principal component analysis as were then classified using three traditional machine learning
features dimensionality reduction. classifiers namely SVM, RF, and KNN. Also, a dataset of
2400 tomato images was used in our experiments, with the
1) Support vector machine – SVM first batch having healthy and reject classes and the second
The support vector machine [28] is highly effective in image batch having ripe, unripe, and reject classes, as described in
classification and regression tasks because it can handle di- Table 1. The dataset was randomly divided into 70% (1680
mensionality issues. This algorithm employs support vectors images) for training, 10% (240 images) for validation, and
for determining the coordinates of individual observations. 20% (480 images) for testing. The standard hyperparameters
It can produce accurate outcomes even in high-dimensional used in this experiment were 0.005 learning rate, 32 minibatch
spaces, such as when the number of dimensions exceeds the sizes, and 60 epochs.
number of samples.
B. DISCUSSION
2) Random forest - RF To evaluate the performance of our proposed model, we
The random forest classifier is a popular machine learning performed four different experiments using our dataset and
algorithm that belongs to the ensemble learning family of one experiment using a public dataset available on [44] as
methods [41]. It is widely used for classification and regres- described in experiments 1 to 5. The performance was eval-
sion tasks in which the goal is to make accurate predictions. uated based on training time, testing time, accuracy, recall,
The random forest algorithm builds an ensemble of decision precision, specificity, and F-1 score calculated according to
trees by randomly selecting a subset of features and data equation (2) to (6).
samples to train each tree. During training, the trees learn
to classify the input based on splitting rules. When making TP + TN
Accuracy = (2)
a prediction, the input is passed through each tree, and the TP + TN + FP + FN
final prediction is determined by aggregating the individual TP
tree predictions, often using majority voting. random forest is Recall (R) = (3)
TP + FN
known for its ability to handle high-dimensional data, provide
accurate predictions, and identify important features for the TP
Precision(P ) = (4)
task at hand. TP + FP
TN
3) K-nearest neighbor -KNN Specificity = (5)
The K-Nearest Neighbors (KNN) is a machine learning al- TN + FP
gorithm for classification and regression tasks. A simple but
RP
effective algorithm finds the k-nearest data points in the F1 − score = 2 (6)
R+P
6 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
Where TP, TN, FP, and FN represent true positive, true reject). The model’s performance is summarized in Table
negative, false Positive, and false negative, respectively. 5, where an average accuracy of 93.13% has been obtained
for transfer learning for feature extractor evaluation. Among
1) Experiment 1: Performance of the proposed model using the proposed hybrid models, CNN-SVM attained the highest
the binary dataset. accuracy of 94.37% when MobileNetv2 was used as a feature
Table 2 summarizes the performance of the Transfer Learn- extractor. On average the training time of 5.72 seconds for
ing, CNN-SVM, CNN-RF, and CNN-KNN models on the 1680 images and the testing time of 3.30 milliseconds per
binary classification (healthy and reject) regarding accuracy image was recorded for the proposed hybrid model CNN-
with corresponding training and testing times. In Table 2, first SVM compared to others.
performed transfer learning to evaluate the performance of
our selected pre-trained networks, where an average accuracy 5) Experiment 5: Performance of the proposed model using
of 95.52% has been attained for classifying tomatoes into the public dataset.
binary classes. The performance of transfer learning gives We evaluated our model using the public dataset, which has
a starting point for determining whether a network can be four classes, i.e., ripe, unripe, old, and damaged, available
deployed for feature extraction. Among the proposed hybrid on [44]. The model’s performance is summarized in Table
models, CNN-SVM attained the highest accuracy of 97.50% 6, where an average of 95.38% accuracy has been obtained
when Inceptionv3 was used as a feature extractor. Further- for transfer learning for feature extraction evaluation. Among
more, on average the training time of 4.45 seconds for 1680 the proposed hybrid models, CNN-SVM attained the highest
images, and the testing time averaged 2.43 milliseconds per accuracy of 97.54% when Inceptionv3 was used as a feature
image for the proposed hybrid model CNN-SVM compared extractor. Furthermore, on average the training time of 4.20
to other models. seconds for 1423 images, and the testing time of 2.68 mil-
liseconds per image was recorded for the proposed hybrid
2) Experiment 2: Performance of the proposed model using model CNN-SVM compared to others. However, the perfor-
the multiclass dataset. mance of our proposed model in the public dataset is slightly
We evaluated our model using batch two datasets (the mul- higher compared to our dataset, this is because the inter-class
ticlass), as shown in Table 1. The model’s performance is color feature in the public dataset is large compared to our
summarized in Table 3, where transfer learning was first dataset as shown in Figure 12.
performed to evaluate the performance of our selected pre-
trained networks for feature extraction, where an average C. PERFORMANCE METRICS OF THE PROPOSED MODEL
accuracy of 94.95% has been obtained for multiclass clas- Figures 7 to 11 show the confusion matrices of the best-
sification. Among the proposed hybrid models, the highest performing proposed model in each experiment. From each
accuracy of 96.67% was attained by CNN-SVM when Incep- confusion matrix, we computed respective standard perfor-
tionv3 was used as a feature extractor. On average the training mance metrics i.e., recall, precision, specificity, and F-1 score
time of 4.54 seconds for 1680 images and the testing time of calculated according to equations (3) to (6) as summarized in
2.44 milliseconds per image was recorded for the proposed Table 7.
CNN-SVM model compared to other models.
D. IMAGE ANALYSIS AND DESIGN OPTIMIZATION
3) Experiment 3: Performance of the proposed model in Within the scope of our investigation, the examination of
lightweight CNN on the binary dataset. image features involved aspects related to both appearance
We evaluated our proposed model in lightweight pre- and resolution within the dataset. Our analysis revealed that
trained networks, i.e., ShuffleNet (5.4MB) [45], MobileNetv2
(20MB) [36], and EfficientNetB0 (13MB) [46], using the
binary datasets (healthy and reject). The model’s performance
is summarized in Table 4, where an average accuracy of
92.12% has been obtained for transfer learning as we evaluate
the feature extractor. The highest accuracy of 95.21% was
achieved by CNN-SVM when MobileNetv2 was used as a
feature extractor. On average the training time of 5.68 seconds
for 1680 images and the testing time of 3.28 milliseconds
per image was recorded for the proposed hybrid model CNN-
SVM compared to others.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
Inceptionv3 outperforms Resnet50 when handling images tures that are in proximity. Conversely, Resnet50 excels with
with closely aligned appearance characteristics, such as those images characterized by slightly broader inter-class color fea-
present in our dataset, which exhibits inter-class color fea- tures, as found in public datasets. Notably, Figure 12 provides
8 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
TABLE 4. Performance of proposed model in lightweight CNN using the binary datasets.
visual representations of inter-class color features from both versely, ResNet50 performs marginally better than Incep-
our dataset and the public dataset. tionv3 with the public dataset, as its input resolution closely
The resolution of our dataset stands at 640x480 pixels, approximates the resolution of the public dataset [48].
whereas the public dataset is at 256x256 pixels. Our anal- In the process of model optimization, we conducted fea-
ysis further demonstrates that Inceptionv3 attains superior ture extraction from various layers, observing a consistent
performance with our dataset due to its input resolution improvement in classification accuracy with the utilization of
(299x299 pixels) closely aligning with our dataset’s dimen- progressively deeper layers used for feature extraction [49].
sions [47], in contrast to ResNet50 (224x224 pixels). Con- Additionally, significant improvements in classification accu-
VOLUME 11, 2023 9
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
(a)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
[32] for binary and multiclass classification. Table 8 sum- able for hardware implementation and deploy it for real-time
marises the performance comparison of our proposed model inference. For model deployment, the system may require
with SOTA in terms of accuracy. From Table 8, it was ob- multiple cameras or tomato rotating mechanical support to
served that the proposed model performs better compared to facilitate capturing images at different angles.
SOTA.
ACKNOWLEDGMENT
V. CONCLUSION We express our sincere gratitude to the Egypt-Japan Univer-
We presented a new approach for tomato quality grading sity of Science and Technology (E-JUST) and the TICAD7
based on the external image features. The proposed approach scholarship program for graciously providing us with the
utilized the pre-trained networks for feature extraction and essential research facilities that were instrumental in the suc-
traditional machine learning algorithms as the classifiers (hy- cessful execution of our study.
brid model). We took advantage of fine-tuning techniques in
the pre-trained networks to make the networks suitable for CONFLICT OF INTEREST
the deep layers to learn and concentrate on the complex and The authors have no conflict of interest.
significant features of the tomato images. Also, we analyzed
and performed various image preprocessing techniques for References
feature enhancement and performance improvement in our [1] FAO. ‘‘Tomatoes production.’’ (2021), [Online]. Avail-
proposed method. Features obtained from these networks are able: https : / / www. fao . org / faostat / en / #data / QCL /
then classified by SVM, RF, and KNN classifiers. We vividly visualize (visited on 05/04/2023).
demonstrated the performance of various fine-tuned networks [2] Tilasto. ‘‘World: Tomatoes, production quantity.’’
and the highest accuracy attained by Inceptionv3. Later, In- (2021), [Online]. Available: https://www.tilasto.com/
ceptionv3 was considered further for feature extraction and en/topic/geography- and- agriculture/crop/tomatoes/
utilized in our classifiers. tomatoes - production - quantity / world ? countries =
Among the proposed hybrid models, the CNN-SVM World (visited on 06/20/2023).
method outperformed others with an accuracy of 97.50% in [3] R. Nithya, B. Santhi, R. Manikandan, M. Rahimi, and
the binary classification of tomatoes into healthy and reject A. H. Gandomi, ‘‘Computer vision system for mango
and an accuracy of 96.67% in the multiclass classification fruit defect detection using deep convolutional neural
of tomatoes into ripe, unripe, and reject when Inceptionv3 network,’’ Foods, vol. 11, no. 21, p. 3483, 2022.
was used as a feature extractor. The hybrid CNN-SVM [4] OECD, ‘‘International standards for fruit and
method outperformed other models by achieving an accuracy vegetables-tomatoes,’’ The Organization for Economic
of 97.54% with Inceptionv3 as a feature extractor once de- Cooperation, Tech. Rep., 2019.
ployed in a public dataset to categorize tomatoes into ripe, [5] USDA, ‘‘United states consumer standards for fresh
unripe, old, and damaged. However, as compared to the public tomatoes,’’ The United States Department of Agricul-
dataset, the classification accuracy of the proposed model in ture, Tech. Rep., 2018.
our dataset could potentially be improved if the difference [6] M. M. Khodier, S. M. Ahmed, and M. S. Sayed,
in color characteristics between classes in the dataset were ‘‘Complex pattern jacquard fabrics defect detection
slightly higher. The investigation revealed that the proposed using convolutional neural networks and multispectral
model outperforms the SOTA. Furthermore, the proposed imaging,’’ IEEE Access, vol. 10, pp. 10 653–10 660,
model can operate in different backgrounds of varying light 2022.
sources. [7] J. Amin, M. A. Anjum, R. Zahra, M. I. Sharif, S.
This study is subject to certain limitations, notably the Kadry, and L. Sevcik, ‘‘Pest localization using yolov5
potential for overfitting when training a model with a dataset and classification based on quantum convolutional net-
containing fewer than 500 images, particularly those with low work,’’ Agriculture, vol. 13, no. 3, p. 662, 2023.
intensity and minimal inter-class color features. Additionally, [8] S. R. Shah, S. Qadri, H. Bibi, S. M. W. Shah, M. I.
the consideration of feature size and texture was beyond the Sharif, and F. Marinello, ‘‘Comparing inception v3,
scope of this study. In the future study, we will consider vgg 16, vgg 19, cnn, and resnet 50: A case study on
developing a model that will operate effectively on images early detection of a rice disease,’’ Agronomy, vol. 13,
with varying attributes, including color, size, and texture. no. 6, p. 1633, 2023.
Additionally, we plan to extend the application of our model [9] S. Ahlawat and A. Choudhary, ‘‘Hybrid cnn-svm clas-
to various crops and items while concurrently employing an sifier for handwritten digit recognition,’’ Procedia
in-depth analysis of image feature characteristics as a comple- Computer Science, vol. 167, pp. 2554–2560, 2020.
mentary approach, aimed at deriving sound and comparable [10] A. Copiaco, C. Ritz, S. Fasciani, and N. Abdulaziz,
conclusions, thereby enhancing the robustness of our findings ‘‘Scalogram neural network activations with machine
beyond the post-training assessment method. learning for domestic multi-channel audio classifica-
Furthermore, we will investigate optimization techniques tion,’’ in 2019 IEEE International Symposium on Sig-
for model size reduction without hurting accuracy to be suit-
VOLUME 11, 2023 11
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
nal Processing and Information Technology (ISSPIT), image processing,’’ Artificial Intelligence in Agricul-
IEEE, 2019, pp. 1–6. ture, vol. 2, pp. 28–37, 2019.
[11] A. M. Abdelsalam and M. S. Sayed, ‘‘Real-time de- [24] M. H. Malik, T. Zhang, H. Li, M. Zhang, S. Shabbir,
fects detection system for orange citrus fruits using and A. Saeed, ‘‘Mature tomato fruit detection algo-
multi-spectral imaging,’’ in 2016 IEEE 59th Interna- rithm based on improved hsv and watershed algo-
tional Midwest Symposium on Circuits and Systems rithm,’’ IFAC-PapersOnLine, vol. 51, no. 17, pp. 431–
(MWSCAS), IEEE, 2016, pp. 1–4. 436, 2018.
[12] N. K. Bahia, R. Rani, A. Kamboj, and D. Kakkar, ‘‘Hy- [25] S. Malhotra and R. Chhikara, ‘‘Automated grading
brid feature extraction and machine learning approach system to evaluate ripeness of tomatoes using deep
for fruits and vegetable classification,’’ Pertanika Jour- learning methods,’’ in Proceedings of the Second Inter-
nal of Science and Technology, pp. 1693–1708, 2019. national Conference on Information Management and
[13] M. Knott, F. Perez-Cruz, and T. Defraeye, ‘‘Facili- Machine Intelligence: ICIMMI 2020, Springer, 2021,
tated machine learning for image-based fruit quality pp. 129–137.
assessment in developing countries,’’ arXiv preprint [26] N. Begum and M. K. Hazarika, ‘‘Maturity detection
arXiv:2207.04523, 2022. of tomatoes using transfer learning,’’ Measurement:
[14] T. Lu, B. Han, L. Chen, F. Yu, and C. Xue, ‘‘A generic Food, vol. 7, p. 100 038, 2022.
intelligent tomato classification system for practical [27] R. G. de Luna, E. P. Dadios, A. A. Bandala, and
applications using densenet-201 with transfer learn- R. R. P. Vicerra, ‘‘Size classification of tomato fruit
ing,’’ Scientific Reports, vol. 11, no. 1, p. 15 824, 2021. using thresholding, machine learning, and deep learn-
[15] S. R. N. Appe, G. Arulselvi, and G. Balaji, ‘‘Tomato ing techniques,’’ AGRIVITA, Journal of Agricultural
ripeness detection and classification using vgg based Science, vol. 41, no. 3, pp. 586–596, 2019.
cnn models,’’ International Journal of Intelligent Sys- [28] G. Liu, S. Mao, and J. H. Kim, ‘‘A mature-tomato
tems and Applications in Engineering, vol. 11, no. 1, detection algorithm using machine learning and color
pp. 296–302, 2023. analysis,’’ Sensors, vol. 19, no. 9, p. 2023, 2019.
[16] Y. Fu, M. Nguyen, and W. Q. Yan, ‘‘Grading methods [29] S. Kaur, A. Girdhar, and J. Gill, ‘‘Computer vision-
for fruit freshness based on deep learning,’’ SN Com- based tomato grading and sorting,’’ in Advances
puter Science, vol. 3, no. 4, p. 264, 2022. in Data and Information Sciences: Proceedings of
[17] J. S. Tata, N. K. V. Kalidindi, H. Katherapaka, S. K. ICDIS-2017, Volume 1, Springer, 2018, pp. 75–84.
Julakal, and M. Banothu, ‘‘Real-time quality assurance [30] B. D. Learning, ‘‘A performance and power analysis,’’
of fruits and vegetables with artificial intelligence,’’ in NVidia Whitepaper, Nov, 2015.
Journal of Physics: Conference Series, IOP Publish- [31] Z. Labs. ‘‘Optical fruit grading and sorting machine.’’
ing, vol. 2325, 2022, p. 012 055. (2019), [Online]. Available: https://zentronlabs.com/
[18] T. Tao and X. Wei, ‘‘A hybrid cnn–svm classifier for systems/optical- fruit- grading- and- sorting- machine/
weed recognition in winter rape field,’’ Plant Methods, (visited on 09/20/2023).
vol. 18, no. 1, p. 29, 2022. [32] H. S. M. et al. ‘‘Tomato fruits dataset for binary and
[19] Y. Chen, H. Sun, G. Zhou, and B. Peng, ‘‘Fruit classi- multiclass classification.’’ (2023), [Online]. Available:
fication model based on residual filtering network for https : / / data . mendeley. com / datasets / x4s2jz55dx / 1
smart community robot,’’ Wireless Communications (visited on 10/10/2023).
and Mobile Computing, vol. 2021, pp. 1–9, 2021. [33] N. Otsu, ‘‘A threshold selection method from gray-
[20] Z. Zha, D. Shi, X. Chen, H. Shi, and J. Wu, ‘‘Clas- level histograms,’’ IEEE transactions on systems, man,
sification of appearance quality of red grape based and cybernetics, vol. 9, no. 1, pp. 62–66, 1979.
on transfer learning of convolution neural network,’’ [34] A. Mikołajczyk and M. Grochowski, ‘‘Data augmen-
2023. tation for improving deep learning in image classifica-
[21] H. Alshammari, K. Gasmi, I. Ben Ltaifa, M. Krichen, tion problem,’’ in 2018 international interdisciplinary
L. Ben Ammar, and M. A. Mahmood, ‘‘Olive dis- PhD workshop (IIPhDW), IEEE, 2018, pp. 117–122.
ease classification based on vision transformer and [35] M. Hussain, J. J. Bird, and D. R. Faria, ‘‘A study on cnn
cnn models,’’ Computational Intelligence and Neuro- transfer learning for image classification,’’ in Advances
science, vol. 2022, 2022. in Computational Intelligence Systems: Contributions
[22] A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., Presented at the 18th UK Workshop on Computational
‘‘An image is worth 16x16 words: Transformers Intelligence, September 5-7, 2018, Nottingham, UK,
for image recognition at scale,’’ arXiv preprint Springer, 2019, pp. 191–202.
arXiv:2010.11929, 2020. [36] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and
[23] D. Ireri, E. Belal, C. Okinda, N. Makange, and C. Ji, L.-C. Chen, ‘‘Mobilenetv2: Inverted residuals and lin-
‘‘A computer vision system for defect discrimination ear bottlenecks,’’ in Proceedings of the IEEE con-
and grading in tomatoes using machine learning and ference on computer vision and pattern recognition,
2018, pp. 4510–4520.
12 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
[37] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and puter vision,’’ in 2017 Second International Confer-
Z. Wojna, ‘‘Rethinking the inception architecture for ence on Mechanical, Control and Computer Engineer-
computer vision,’’ in Proceedings of the IEEE con- ing (ICMCCE), IEEE, 2017, pp. 151–155.
ference on computer vision and pattern recognition,
2016, pp. 2818–2826.
[38] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual
learning for image recognition,’’ in Proceedings of
the IEEE conference on computer vision and pattern
recognition, 2016, pp. 770–778. HASSAN SHABANI MPUTU , received the B.Sc.
[39] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Im- degree in Telecommunications Engineering from
agenet classification with deep convolutional neural the University of Dar es Salaam, Tanzania in 2014.
networks,’’ Communications of the ACM, vol. 60, Since 2021, he has been a master’s and a research
student at the Department of Electronics and Com-
no. 6, pp. 84–90, 2017. munications Engineering, Egypt-Japan University
[40] M. Jogin, M. Madhulika, G. Divya, R. Meghana, S. of Science and Technology, Egypt. His research
Apoorva, et al., ‘‘Feature extraction using convolution focus primarily centers on the field of Artificial
neural networks (cnn) and deep learning,’’ in 2018 Intelligence and Embedded Systems.
3rd IEEE international conference on recent trends in
electronics, information & communication technology
(RTEICT), IEEE, 2018, pp. 2319–2323.
[41] C. Crisci, B. Ghattas, and G. Perera, ‘‘A review of
supervised machine learning algorithms and their ap-
plications to ecological data,’’ Ecological Modelling,
vol. 240, pp. 113–122, 2012. AHMED ABDEL-MAWGOOD , was borne in
Alexandria, Egypt. He then got PhD from Biolog-
[42] Y.-l. Cai, D. Ji, and D. Cai, ‘‘A knn research paper clas- ical Science Dept., Purdue University, USA. He
sification method based on shared nearest neighbor.,’’ moved to Promega Corp., Wisconsin, USA in the
in NTCIR, vol. 336, 2010, p. 340. R&D department in the Nucleic Acids Research
[43] A. Maćkiewicz and W. Ratajczak, ‘‘Principal com- Dept. He worked on the development of new kits
for the nucleic acids detection. He was offered a
ponents analysis (pca),’’ Computers & Geosciences, Professor Job in 2019, at Egypt Japan University
vol. 19, no. 3, pp. 303–342, 1993. of Science and Technology early in 2019. He is
[44] N. Q. Khang. ‘‘Tomatoes dataset.’’ (2021), [Online]. Biotechnology program coordinator and academic
Available: https://www.kaggle.com/datasets/enalis/ advisor for the biotechnology lab. He is also the director of the food safety and
quality management diploma which is approved by the SCU. His interest is
tomatoes-dataset (visited on 06/20/2023). in the application of biotechnology in the everyday life, specifically develop
[45] X. Zhang, X. Zhou, M. Lin, and J. Sun, ‘‘Shufflenet: new products using the microbes isolated from the environment. Also inter-
An extremely efficient convolutional neural network ested in biodegradation of the agriculture waste as well as soil contaminated
for mobile devices,’’ in Proceedings of the IEEE con- with hydrocarbons. Other interest is in collaboration with engineering to
develop new instruments such as conventional PCR.
ference on computer vision and pattern recognition,
2018, pp. 6848–6856.
[46] M. Tan and Q. Le, ‘‘Efficientnet: Rethinking model
scaling for convolutional neural networks,’’ in Interna-
tional conference on machine learning, PMLR, 2019,
pp. 6105–6114.
[47] J.-w. Feng and X.-y. Tang, ‘‘Office garbage intelligent ATSUSHI SHIMADA (Member, IEEE), received
the M.E. and D.E. degrees from Kyushu Univer-
classification based on inception-v3 transfer learning sity, Fukuoka, Japan, in 2007., He is currently a
model,’’ in Journal of Physics: Conference Series, IOP Professor of the Faculty of Information Science
Publishing, vol. 1487, 2020, p. 012 008. and Electrical Engineering, Kyushu University.
[48] H. Touvron, A. Vedaldi, M. Douze, and H. Jé- From 2015 to 2019, he was also a JST-PRESTO
Researcher. His current research interests include
gou, ‘‘Fixing the train-test resolution discrepancy,’’ learning analytics, pattern recognition, media pro-
Advances in neural information processing systems, cessing, and image processing. Dr. Shimada was
vol. 32, 2019. the recipient of the MIRU Interactive Presentation
[49] J. Yue-Hei Ng, F. Yang, and L. S. Davis, ‘‘Exploiting Award (2011 and 2017), MIRU Demonstration Award (2015), Background
Models Challenge 2012 The First Place (2012), PRMU Research Award
local features from deep networks for image retrieval,’’ (2013), SBM-RGBD Challenge The First Place (2017), ITS Symposium
in Proceedings of the IEEE conference on computer vi- Best Poster Award (2018), JST PRESTO Interest Poster Award (2019),
sion and pattern recognition workshops, 2015, pp. 53– IPSJ/IEEE-Computer Society Young Computer Researcher Award (2019),
61. CELDA Best Paper Award (2019), and MEXT Young Scientist Award (2020)
[50] X. Qi, T. Wang, and J. Liu, ‘‘Comparison of sup-
port vector machine and softmax classifiers in com-
VOLUME 11, 2023 13
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3352745
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4