FruitFinder: Comparison of Different YOLO Models for Object
Detection
ABSTRACT This study compares the performance of three YOLO models for
This research compares the performance of three YOLO (You Only object detection. The models are YOLO v5, v8, and v9. YOLO is a
Look Once) architectures—v5, v8, and v9 on a dataset of 3,776 an- real-time object detection system that effectively balances speed
notated images, utilizing mixed dry fruits and nuts. Therefore, This and accuracy, making it essential for industrial applications [13]. In
research work identifies the best performance model among these terms of detection accuracy and inference time, YOLOs are provid-
three models under an industrial setting that requires high accuracy ing significantly better results than two-stage object detectors[4].
for detecting an object. The capability of each model in detecting YOLOs make comparatively fewer errors in the identification of the
objects was analyzed based on accuracy, precision, recall, and mean background and with the help of YOlos quick boost in performance
Average Precision (mAP). The results of the experiment identified can be observed for background detection in the two-stage object
that YOLOv9 has better precision and recall of 99.5% with a mean detection algorithms [4]. At the same time, the structure of YOLO is
average precision of 99.2% per class. This goes on to show that the straightforward and it can directly output the position and category
system remains a preferable option for application in cases where of the bounding box through the neural network [6]. In this study,
high demand is placed on accuracy in detection. YOLO v5 and v8 we compare the architectural and operational differences among
also performed very well, but a bit less than YOLOv9. Highlighting these models and try to determine which model is more suitable
the comparative analysis has shown not only the progressive trend for object detection.
of the YOLO model architecture but also assisted in seeking out This study compares the models on a mixed dry fruit and nut
optimizations of models against complex operational requirements dataset to ensure detection accuracy. We annotated the images to
of the food processing industry. The study can provide a future represent a balanced mix of raisins, peanuts, almonds, cashews, and
benchmark to develop improved technologies of automatic object dates. Each image has been annotated using roboflow [5] which
detection with enhanced accuracy and operational efficiencies. ensures bounding box placements and class labels which is essential
for training an comprehensive object detection model. The dataset
KEYWORDS includes various scenarios such as different lighting, orientations,
and angles to effectively train and test the models’ performance
Dataset, YOLO models, Accuracy, Precision, Recall, Object Detec-
under real-life scenarios.
tion, Comparison.
This study is divided into several sections. In Section 2, we have
ACM Reference Format: reviewed a few articles related to our research. Then in Section 3,
. 2018. FruitFinder: Comparison of Different YOLO Models for Object Detec- we have illustrated the detailed methodology of the research. In
tion. In ICCA 2024: International Conference on Computing Advancements, section 4, we have discussed the results of the study, and finally in
05-06th September 2024, Dhaka, Bangladesh. ACM, New York, NY, USA, Section 5, we brought the study to an end.
7 pages. https://doi.org/10.1145/1122445.1122456
2 LITERATURE REVIEW
1 INTRODUCTION Over the years, researchers have proposed various object detection
Object detection techniques are the foundation for the artificial models for various purposes. In our research, we have identified
intelligence field [6]. It is one of the significant tasks in computer some studies that have contributed significantly to the field of
vision, especially while dealing with smaller and visually similar object detection. In one of the studies, Ahamed et al.[11] proposed
objects[10]. While detecting smaller objects such as dry fruits and a real-time pear fruit detection and counting system using YOLOv4
nuts, traditional deep learning models often fall short in differen- models and Deep SORT. In their research, they stated that YOLOv4-
tiating among such items due to their similarities in texture and CSP requires substantial computational resources but at the same
color [10]. This study tries to enhance the accuracy and reliability time, it is the most accurate model with an [email protected] of 98 percent.
of automated systems in the food processing industry by improving Besides, YOLOv4-tiny emerges as a popular option for scenarios
the detection and classification of mixed dry fruits and nuts with that require high speed and lower computational by achieving over
the help of the YOLO (You Only Look Once) model. 50 frames per second and a lower number of FLOPs. They evaluated
the unique ID method and ROI line method in their research where
Permission to make digital or hard copies of all or part of this work for personal or the unique ID method proved to be more reliable with an F1 score
classroom use is granted without fee provided that copies are not made or distributed of 87.85 percent because of its low false negative rate.
for profit or commercial advantage and that copies bear this notice and the full citation Dayoub et al.[16] proposed a fruit detection system using deep
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, convolutional neural networks and multi-modal fusion. The model
to post on servers or to redistribute to lists, requires prior specific permission and/or a has achieved an F1 score improvement from 0.807 to 0.838 for sweet
fee. Request permissions from [email protected].
pepper detection and promises robotic fruit harvesting efficiency
ICCA 2024, 05-06th September 2024, Dhaka, Bangladesh
© 2018 Association for Computing Machinery. and effectiveness. Furthermore, Patel et al.[12] proposed a fruit
https://doi.org/10.1145/1122445.1122456 detection algorithm based on intensity, color, edge & orientation.
The model is efficient for fruit detection on trees. Their method algorithm. It utilizes multi-scale feature maps and iterative feedback
not only focused on feature extraction and integration but also to optimize object detection mechanisms for smaller objects. In the
utilized techniques such as global thresholding and binary mask study, they have shown the efficiency of the model through testing
generation. The model has successfully detected fruit in over 95 on the COCO dataset. They achieved a mean Average Precision
cases which demonstrates the model’s effectiveness against complex (mAP) of 97 percent which is 1 percent better than traditional
backgrounds. models. The study enhances the feature extraction techniques and
Liu et al.[9] proposed a classification model for checking the object detection without increasing the model’s complexity.
quality of passion fruit using ATC-YOLOv5. ATC-YOLOv5 is an Latha et al.[7] proposed a model using deep learning and IoT
advanced YOLOv5 model. In their study, they achieved a mean technologies to address the challenge of real-time live detection
Average Precision (mAP) of 95.36 percent and a mean detection of fruits and vegetables in markets. It will enable the customers
time (mDT) of 3.2 ms. This model integrates feature extraction, to know the current stock of products. They proposed the use
a transformer layer, and attention mechanisms to handle various of the YOLOv4-tiny model which is suited for embedded devices
quality grades of passion fruits. The improvements reduced model because of its fast processing capabilities and accuracy. In their
parameters by 10.54 percent. It enhances both the speed and accu- research, they proposed a model to identify various types of veg-
racy of the model and makes the model effective for automating etables and fruits to facilitate the real-time update of stock levels.
fruit quality classification in modern agricultural systems. They collected images from Google and Kaggle and with the help
Raj et al.[14] compared various fruit classification techniques of Roboflow they labeled the images. In this study, They achieved a
where the researchers used CNN and YOLO models. From their mean Average Precision (mAP) of 51 percent for the YOLOv4-tiny
analysis, they found out that the researchers have highlighted the model which maintains a rapid inference time of 18 milliseconds.
efficiency of automation in agriculture as it enhances productivity Mean Average Precision is a crucial metric in object detection. This
and reduces labor dependency. The study proposed classification study underscores the potential of deep learning models for enhanc-
models where the YOLO model achieved an accuracy of over 85 ing customer experience and operational efficiency in vegetable
percent whereas the CNN model achieved an accuracy of 74 percent. markets.
At the same time, MATLAB color detection provided an accuracy of Benjumea et al.[2] proposed a YOLOv5 model to enhance its
63 percent accuracy. They highlight the benefits of automation in detection capabilities for small objects specifically for use in au-
reducing agricultural costs and at the same time, the study suggests tonomous vehicle systems. Their study introduced a variant named
that YOLO models provide better performance in fruit classification YOLO-Z and it provided an improved mean Average Precision
tasks compared to other models (mAP) for small objects up to 6.9 percent at a minimal increase in
Alruwaili et al.[1] explores the efficiency of YOLO models such inference time. This adjustment allows earlier detection of objects
as YOLOv8, YOLOv5, and YOLOv7 for object detection. It highlights at greater distances and improves the decision-making capabili-
the capabilities of YOLO models in identifying objects with high ties of autonomous vehicles. The study not only adjusts YOLOv5’s
accuracy and real-time processing capabilities. They utilize a dataset structural elements but also examines the impact of these changes.
of 4,300 images to train the models in their research where YOLOv8 This research aims to extend the model’s applicability to other
performed with an overall precision of 0.907 and a recall of 0.943. precision-required tasks in real-life scenarios.
The model generated excellent results in wheelchair detection with
a precision of 0.998. It highlights the model’s effectiveness in real-
world applications and at the same time, YOLOv8 outperforms other 3 METHODOLOGY
models in terms of detection speed. It processes frames significantly Our objective is to explore new compression strategies to enhance
faster than other models. It highlights the YOLOv8 model’s potential object detection and further the field of computer vision and deep
for integration into technologies that require real-time operational learning research. For this study, we choose several YOLO models.
capabilities.
Legaspi et al.[8] proposed an system for the real-time detection
and classification of White flies and fruit flies using YOLOv3 & a 3.1 YOLO Model
convolutional neural network model. In their study, they utilized a YOLO is a viral and popular model for real-time object detection. It
Raspberry Pi camera to collect images and implemented a REST API is fairly accurate and capable of detecting small objects of similar
to process these images to monitor the pests. The system achieved size and textures like nuts and dry fruits [15]. It employs a single
an overall accuracy of 83.07 percent in identifying these pests which neural network to predict bounding boxes and class probabilities,
underscores the significance of the model to enhance agricultural achieving remarkable speed without sacrificing accuracy [3]. In our
productivity by providing early detection and treatment. At the article, we use YOLOV5m, YOLOV8m, and YOLOv9c models for
same time, the research also addressed the challenges of detecting object detection.
small and fast-moving pests. By utilizing YOLOv3 which is effective
for processing complex visual data, the study aims to develop a
more robust system for agricultural pest management. 3.2 Experimental Design
Talib et al.[17] proposed a smart model for the detection of small In Figure 1, we have shown a detailed design of our research. We
objects in real-time scenarios using YOLOv8-CAB. YOLOv8-CAB collected the datasets and used 70% of them for training, 20% for
is an advancement of the YOLOv8 model. YOLOv8-CAB integrates validation, and 10% for testing. Then we used three pre-trained
Context Attention Blocks (CAB) to enhance the performance of the models; YOLOv5m, YOLOv8m, and YOLOv9c, to train the model.
2
Finally, we used evaluation metrics such as mAP50, confusion ma- • Model Evaluation & Reporting: The final step is to analyze
trix, precision, and recall to evaluate the models and compare the the results and compare the detection performance of each
results to find the best model for dry fruit and nut detection. model variant. Key metrics such as mAP50, precision, recall,
and confusion matrix are compared across the YOLO models
to identify the best-performing architecture for dry fruit and
nut detection.
3.5 Model Architecture
We utilized a variety of YOLO architectures for dry fruit and nut
detection. In Table 1, we have summarized the architecture of the
models. Each variant of the YOLO architecture uses a grid-based
methodology which is efficient for object detection as it processes
the images holistically rather than using segmentation. This diverse
set of models enables us to meet a range of requirements & under-
scores the versatility and effectiveness of the YOLO framework in
addressing object detection challenges. At the same time, this set of
models enables us to fulfill a range of requirements and enhance the
computational efficiency of high-performance precision-focused
applications.
Figure 1: Experimental Design. Table 1: YOLO Models Architecture
Model Layers Parameters Inference Speed (GFLOPs)
3.3 Dataset
YOLOv5m 212 20,869,098 47.9
The model has been initially trained using over 3776 images of YOLOv8m 384 25,323,103 102.3
different dry fruits and nuts. The dataset is divided into 5 distinct YOLOv9c 384 25,323,103 102.3
classes: Almond, Date, Peanut, Cashew, and Raisin. All these images
are annotated with bounding boxes to identify and localize each
dry fruit or nut within the image using roboflow [5]. All the images
have been resized to a standardized dimension of 640x640 pixels. It
is important to note that the dataset includes overlapping images 4 RESULT AND DISCUSSION
to enrich the robustness of training data. The dataset is specially
prepared to train and evaluate object detection models such as
4.1 Overview of Model Performance
Yolov5, Yolov8, and Yolov9. Our implementation of YOLOv5m, YOLOv8m & YOLOv9c on a
dataset of mixed dry fruits & nuts showcases various performance
3.4 Work Flow capabilities across these models (shown in Table 2). We evaluated
the models based on mean Average Precision (mAP), precision,
The overall process is depicted in Figure 1. In this section, we will
recall & their ability to maintain performance across different con-
try and summarize each of those processes.
fidence thresholds. Out of all models, YOLOv9c showcased slightly
• Data Collection and Preprocessing: In this phase, we better performance(shown in Figure 2 ) but YOLOv5 & YOLOv8
clicked the images using a mobile camera. Then the images also showcased their strength for specific scenarios.
were resized to 640X640 pixels and annotated using Roboflow
bounding box architecture. Then we finally split them in a
70:10:20 ratio for training, testing, and validation to feed the
models. Table 2: Models performance metrics
• Model Training: The models are compiled in a Google Co-
laboratory environment with T4 Cuda GPU 15GB of VRAM.
We employed three types of YOLO architecture. The pre- Model Precision Recall mAP50
trained architectures are YOLOv5m, YOLOv8m, and YOLOv9c.
Each trained on the annotated dataset using specified hy- YOLOv5m .994 .994 .989
perparameters such as epochs and batch size. Each model is
trained with an input image size of 640x640 & a batch size of YOLOv8m .994 .994 .991
640 for at least 145 epochs. The training process optimizes YOLOv9c .995 .995 .992
custom loss functions to improve object detection accuracy.
3
Figure 3: YOLOv5 Confusion Matrix.
Figure 4: YOLOv5 Precision-Confidence Curve.
Figure 2: Comparison of Evaluation Matrix.
4.2 Detailed Performance Analysis
4.2.1 YOLOv5m.
YOLOv5 provides excellent performance while keeping a balance be-
tween Confusion Matrix (In Figure 3), Precision-Confidence Curve
(In Figure 4) & Recall-Confidence Curve (In Figure 5). It is partic-
ularly suited for scenarios where rapid object detection is more
important but without risking a significant change in accuracy. Figure 5: YOLOv5 Recall-Confidence Curve.
4
4.2.2 YOLOv8m.
YOLOv8 slightly outperforms YOLOv5 in terms of mAP50. It means
it provides a marginal improvement in consistent accuracy across
different performance measurement matrices such as Confidence
Matrix (In Figure 3), Precision-Confidence Curve (In Figure 7) &
Recall-Confidence Curve (In Figure 8). This model can be effective
for applications where precision is slightly favored & provides
better performance with better class separation ability.
Figure 8: YOLOv8 Recall-Confidence Curve.
4.2.3 YOLOv9c.
YOLOv9 provides the highest precision (In Figure 10) & recall (In
Figure 11) among the tested models along with a confusion matrix
(In Figure 9). These performance measurement matrices make it
the best option for environments that require significant accuracy,
such as product sorting where mistakes can lead to significant
Figure 6: YOLOv8 Confusion Matrix. operational setbacks.
Figure 7: YOLOv8 Precision-Confidence Curve. Figure 9: YOLOv9 Confusion Matrix.
5
Figure 12: Comparison of Detection Accuracy
Figure 10: YOLOv9 Precision-Confidence Curve.
4.4 Future work
Future researchers should focus on optimizing the model for specific
object classes. At the same time, they should focus on integrating
the model into automated systems. It will be beneficial for automat-
ing operational environments in the industrial sector. At the same
time, they should also consider the impact of varying environmen-
tal conditions on model performance to ensure the best possible
performance of the model across all real-time scenarios.
5 CONCLUSION
In this research, a comparative analysis of YOLO models highlights
each model’s strengths & limitations in detecting smaller objects
like mixed dry fruits & nuts to give us an idea about the performance
of the models. In this research, YOLOv9 provides the highest overall
precision, mAP50 & recall for object detection. These parameters
make it significantly suitable for applications where accuracy is fun-
damental. YOLOv5 which is slightly less preferable, offers a balance
of speed & accuracy. It makes the model significantly important for
rapid detection. YOLOv8 provides better accuracy than YOLOv5
without sacrificing much detection speed. These findings provide
insight into the performances of these models & help to choose the
Figure 11: YOLOv9 Recall-Confidence Curve. appropriate model based on specific operational requirements. In
summary, these models have presented so many insights that are
not explained in this paper & future research related to them shall
explain how these models have enhanced performance in other
4.3 Comparative Analysis complex object detection. In the meantime, the expansion of the
The performance indicates very close results among the models. dataset & testing has been conducted under complex environmental
From the analysis, YOLOv9 slightly leading due to its better de- conditions, which may be effective in providing deep insights into
tection accuracy (Shown in Figure 12). Still, the choice of model performance as far as practical applicability in real-time scenarios
should be made according to the specific application needs: is concerned.
• YOLOv5:This model provides excellent speed & better pre- 6 ACKNOWLEDGMENTS
cision for real-time processing where slight compromises on We would like to thank Abdullah Al Sakib for his feedback and
accuracy are tolerable. discussion on the prediction models. At the same time, we would
• YOLOv8: This model is the best for settings where both like to thank United International University for the cooperation
speed & higher precision are needed but with slightly better and funding for the research.
consistency than YOLOv5.
• YOLOv9:This model is perfect for the highest accuracy re- REFERENCES
quirements & suitable for high-stakes precision tasks such [1] Madallah Alruwaili, Muhammad Nouman Atta, Muhammad Hameed Siddiqi,
as sorting where every detail matters. Abdullah Khan, Asfandyar Khan, Yousef Alhwaiti, and Saad Alanazi. 2023. Deep
6
Learning-Based YOLO Models for the Detection of People With Disabilities. IEEE [9] Changhong Liu, Weiren Lin, Yifeng Feng, Ziqing Guo, and Zewen Xie. 2023.
Access (2023). ATC-YOLOv5: Fruit Appearance Quality Classification Algorithm Based on the
[2] Aduen Benjumea, Izzeddin Teeti, Fabio Cuzzolin, and Andrew Bradley. 2021. Improved YOLOv5 Model for Passion Fruits. Mathematics 11, 16 (2023), 3615.
YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles. [10] Yang Liu, Peng Sun, Nickolas Wergeles, and Yi Shang. 2021. A survey and
arXiv preprint arXiv:2112.11798 (2021). performance evaluation of deep learning methods for small object detection.
[3] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Expert Systems with Applications 172 (2021), 114602.
Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 [11] Addie Ira Borja Parico and Tofael Ahamed. 2021. Real time pear fruit detection
(2020). and counting using YOLOv4 models and deep SORT. Sensors 21, 14 (2021), 4803.
[4] Tausif Diwan, G Anirudh, and Jitendra V Tembhurne. 2023. Object detection [12] Hetal N Patel, RK Jain, Manjunath V Joshi, et al. 2011. Fruit detection using
using YOLO: Challenges, architectural successors, datasets and applications. improved multiple features based algorithm. International journal of computer
multimedia Tools and Applications 82, 6 (2023), 9243–9275. applications 13, 2 (2011), 1–5.
[5] B. Dwyer, J. Nelson, T. Hansen, and et al. 2024. Roboflow (Version 1.0). https: [13] Dinh-Lam Pham, Tai-Woo Chang, et al. 2023. A YOLO-based real-time packaging
//roboflow.com Computer Vision. defect detection system. Procedia Computer Science 217 (2023), 886–894.
[6] Peiyuan Jiang, Daji Ergu, Fangyao Liu, Ying Cai, and Bo Ma. 2022. A Review of [14] Riyanshu Raj, SS Nagaraj, Saurav Ritesh, TA Thushar, and VM Aparanji. 2021.
Yolo algorithm developments. Procedia computer science 199 (2022), 1066–1073. Fruit Classification Comparison Based on CNN and YOLO. In IOP Conference
[7] RS Latha, GR Sreekanth, R Rajadevi, SK Nivetha, K Ajith Kumar, V Akash, S Bhu- Series: Materials Science and Engineering, Vol. 1187. IOP Publishing, 012031.
vanesh, and Pon Anbarasu. 2022. Fruits and vegetables recognition using YOLO. [15] Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement.
In 2022 International Conference on Computer Communication and Informatics arXiv preprint arXiv:1804.02767 (2018).
(ICCCI). IEEE, 1–6. [16] Inkyu Sa, Zongyuan Ge, Feras Dayoub, Ben Upcroft, Tristan Perez, and Chris
[8] Krystoffer Rowick B Legaspi, Niño Warren S Sison, and Jocelyn Flores Villaverde. McCool. 2016. Deepfruits: A fruit detection system using deep neural networks.
2021. Detection and classification of whiteflies and fruit flies using YOLO. In 2021 sensors 16, 8 (2016), 1222.
13th International Conference on Computer and Automation Engineering (ICCAE). [17] Moahaimen Talib, Ahmed HY Al-Noori, and Jameelah Suad. 2024. YOLOv8-CAB:
IEEE, 1–4. Improved YOLOv8 for Real-time object detection. Karbala International Journal
of Modern Science 10, 1 (2024), 5.