CES-YOLOv8 Strawberry Maturity Detection Based On
CES-YOLOv8 Strawberry Maturity Detection Based On
Article
CES-YOLOv8: Strawberry Maturity Detection Based on the
Improved YOLOv8
Yongkuai Chen 1,† , Haobin Xu 1,2,† , Pengyan Chang 1 , Yuyan Huang 1 , Fenglin Zhong 2 , Qi Jia 4 , Lingxiao Chen 5 ,
Huaiqin Zhong 3, * and Shuang Liu 2, *
1 Institute of Digital Agriculture, Fujian Academy of Agricultural Sciences, Fuzhou 350003, China;
[email protected] (Y.C.); [email protected] (H.X.); [email protected] (P.C.);
[email protected] (Y.H.)
2 College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China;
[email protected]
3 Crops Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou 350003, China
4 Jiuquan Academy of Agriculture Sciences, Jiuquan 735099, China; [email protected]
5 Fujian Agricultural Machinery Extension Station, Fuzhou 350002, China; [email protected]
* Correspondence: [email protected] (H.Z.); [email protected] (S.L.)
† These authors contributed to the work equally and should be regarded as co-first authors.
Abstract: Automatic harvesting robots are crucial for enhancing agricultural productivity, and
precise fruit maturity detection is a fundamental and core technology for efficient and accurate
harvesting. Strawberries are distributed irregularly, and their images contain a wealth of characteristic
information. This characteristic information includes both simple and intuitive features, as well as
deeper abstract meanings. These complex features pose significant challenges to robots in determining
fruit ripeness. To increase the precision, accuracy, and efficiency of robotic fruit maturity detection
methods, a strawberry maturity detection algorithm based on an improved CES-YOLOv8 network
structure from YOLOv8 was developed in this study. Initially, to reflect the characteristics of actual
planting environments, the study collected image data under various lighting conditions, degrees of
occlusion, and angles during the data collection phase. Subsequently, parts of the C2f module in the
YOLOv8 model’s backbone were replaced with the ConvNeXt V2 module to enhance the capture
Citation: Chen, Y.; Xu, H.; Chang, P.; of features in strawberries of varying ripeness, and the ECA attention mechanism was introduced
Huang, Y.; Zhong, F.; Jia, Q.; Chen, L.; to further improve feature representation capability. Finally, the angle compensation and distance
Zhong, H.; Liu, S. CES-YOLOv8: compensation of the SIoU loss function were employed to enhance the IoU, enabling the rapid
Strawberry Maturity Detection Based
localization of the model’s prediction boxes. The experimental results show that the improved CES-
on the Improved YOLOv8. Agronomy
YOLOv8 model achieves an accuracy, recall rate, mAP50, and F1 score of 88.20%, 89.80%, 92.10%, and
2024, 14, 1353. https://doi.org/
88.99%, respectively, in complex environments, indicating improvements of 4.8%, 2.9%, 2.05%, and
10.3390/agronomy14071353
3.88%, respectively, over those of the original YOLOv8 network. This algorithm provides technical
Academic Editors: Yanbo Huang support for automated harvesting robots to achieve efficient and precise automated harvesting.
and Alberto San Bautista
Additionally, the algorithm is adaptable and can be extended to other fruit crops.
Received: 11 May 2024
Revised: 26 May 2024 Keywords: automatic harvesting robots; CES-YOLOv8; strawberry maturity
Accepted: 20 June 2024
Published: 22 June 2024
1. Introduction
With the dual pressures of global population growth and a gradual reduction in arable
Copyright: © 2024 by the authors.
land, increasing agricultural production has become an important societal challenge. The
Licensee MDPI, Basel, Switzerland.
implementation of smart agriculture is a key solution to this challenge, in which the use of
This article is an open access article
distributed under the terms and
digital information technology and intelligent equipment is crucial for achieving efficient
conditions of the Creative Commons
and sustainable agricultural development [1]. Among the many applications of smart
Attribution (CC BY) license (https:// agriculture, automated harvesting robot technology can replace manual labor, significantly
creativecommons.org/licenses/by/ increasing harvesting efficiency, which is especially important in regions with high labor
4.0/).
costs or labor shortages. Fruit maturity detection is a fundamental and critical technology
for the efficient and accurate performance of automated harvesting robots.
Traditional automated harvesting systems mostly rely on simple color and size recogni-
tion for determining fruit maturity. Yamamoto and others proposed an algorithm based on
color threshold segmentation to isolate strawberry targets [2]. Hayashi and others designed
a strawberry-harvesting robot that also uses a color threshold segmentation algorithm to
detect strawberries and estimate maturity [3]. Kaur and others utilized external quality fea-
tures such as color, texture, and size to detect the maturity of plums [4]. Villaseñor-Aguilar
and others proposed a new fuzzy classification framework based on the RGB color model
to categorize the maturity of tomatoes [5]. Although these methods have resolved the
maturity detection issue to some extent, they have stringent requirements for the detection
environment and growth conditions. However, in actual production, the fruit maturation
process is influenced by many factors, such as the fruit variety and growth conditions (such
as light and humidity), which affect the color at maturity. Moreover, the color of the fruit
may also change due to shading, pests, and diseases, among other reasons [6], which can
affect color recognition accuracy.
By automatically learning the intrinsic connections and patterns within annotated
datasets, deep learning technologies can be used to effectively extract deep features from
images; they exhibit especially high accuracy and rapid identification in complex scene
target detection and classification. In recent years, deep learning technologies have been
rapidly integrated into various agricultural research fields, including fruit maturity de-
tection in complex environments. Subramanian Parvathi and others have improved the
region-based Faster R-CNN model for detecting the maturity of coconuts against complex
backgrounds [7]. Wang Lingmin and colleagues utilized an enhanced AlexNet model
to classify the maturity of bananas, achieving an accuracy of 96.67% [8]. Zan Wang and
associates designed an improved Faster R-CNN model, MatDet, for the detection of tomato
maturity. Experimental results indicate that, in complex scenes, the proposed model
achieved optimal detection results under conditions of branch occlusion, fruit overlapping,
and lighting effects, with a mean average precision (mAP) of 96.14% [9]. Chen Fengjun
and colleagues proposed an improved method for detecting the maturity of olive fruits
using EfficientDet, and the model’s precision P, recall rate R, and mean average precision
mAP in the test set were 92.89%, 93.59%, and 94.60%, respectively [10]. Wang Congyue
and others introduced an enhanced object detection algorithm based on YOLOv5n for the
real-time identification and maturity detection of cherry tomatoes, achieving an average
accuracy of 95.2% [11]. Elizabeth Haruna Kazama and others used an enhanced YOLOv8
model modified through convolution blocks (RFCAConv) to classify the maturity stages of
coffee fruits, with the model reaching an [email protected] of 74.20% [12]. Megalingam, Rajesh
Kannan, and colleagues proposed an integrated fuzzy deep learning model (IFDM) for
classifying the maturity levels of coconuts. The study showed that the real-time learning
model achieved an accuracy of 86.3% in classifying coconut maturity levels [13]. Currently,
fruit maturity detection methods based on convolutional neural networks have rapidly
developed, yet issues remain. Methods with high detection accuracy often have high com-
putational complexity and slow detection speeds, while methods that are computationally
simpler and faster tend to have lower accuracy [14].
To address the aforementioned issues, this study used strawberries as its research
subject and basis for improvements on the YOLOv8 object detection network, propos-
ing a novel strawberry ripeness detection algorithm named CES-YOLOv8 to enhance the
accuracy of ripeness detection. The algorithm enhances the accuracy and robustness of
strawberry ripeness recognition via automated harvesting robots under various environ-
mental conditions without sacrificing real-time processing capabilities, providing technical
support for efficient and precise automated harvesting. This research not only helps en-
hance the practicality and economic benefits of automated harvesting technology but also
offers technical references for smart agriculture in precision agricultural management,
harvesting, and sorting.
2.
2. Materials
Materials and and Methods
Methods
2.1.
2.1. Classification
Classification of of Strawberry
Strawberry Ripeness
Ripeness
In
In this
this study,
study, strawberry
strawberry ripeness
ripeness was
was classified
classified into
into four
four levels
levels based
based onon the
the growth
growth
Agronomy 2024, 14, 1353 3 of 14
and color changes of the fruit, as shown in
and color changes of the fruit, as shown in Table 1. Table 1.
Level
Level 11 isis the
the unripe
unripe stage,
stage, characterized
characterized by by hard
hard flesh
flesh and
and aa green
green surface
surface on
on the
the
skin.
skin. Level 2 is the white ripe stage, where the fruit begins to change from green to white,
Level 2 is the white ripe stage, where the fruit begins to change from green to white,
2. with
Materials red and Methods
with pale
pale red spots
spots starting
starting to
to appear
appear inin some
some areas,
areas, and
and Level
Level 33 is
is the
the color-changing
color-changing
2.1. Classification
stage, of Strawberry Ripeness
stage, where
where thethe color
color change
change ofof the
the fruit
fruit becomes
becomes more
more pronounced,
pronounced, withwith the
the red
red coloring
coloring
In this
starting study, strawberry ripeness was classified into four levels based
starting to spread and cover more of the fruit surface, although some areas remain white
to spread and cover more of the fruit surface, although some on
areas the growth
remain white
andorcolor
or pale changes of the fruit, as shown in Table 1.
pale red.
red.
Table
Table
Table 1.
1. Classification
1. Classification of of
Classification of strawberry
strawberry maturity
maturity
strawberry levels.
levels.
maturity levels.
Grade
Grade Label
Label
Grade Label Description
Description
Description Image
Image
Image
11 Immature_stage
Immature_stage
1 Immature_stage Fruit
Fruit remains
remains green
green
Fruit remains green
Fruit
Fruit begins
begins to change
Fruittobegins
change from
from green
to change green
from
22 Mature_white_stage
Mature_white_stage
2 to
to white;
Mature_white_stagewhite; some
some
green varieties
varieties
to white; somestart
start to
to
varieties
show
startlight
to red
show spots
light
show light red spots red spots
Red
Red starts
starts to
to spread
Red startsspread and cover
and and
to spread cover
cover
more
more of
of the
the
morefruit
fruit
of surface,
surface,
the fruit but
but there
there
surface, but
33 Color_turning_stage
3 Color_turning_stage
Color_turning_stage are
are still areas
still there
areasarethat
that are
still
are white
areas thator
white are
or
white or light red
light
light red
red
The colorThe
of color
the strawberries uni-
of the strawberries
44 Ripe_stage
4
Ripe_stage Ripe_stage The color of the strawberries uni-
formly turns bright red red
uniformly
formly turns turns
bright bright
red
formly turns bright red
2.2.Level 1Collection
is the unripe Dataset
stage, characterized by hard flesh and a green surface on the
2.2. Image
Image Collection and
and Dataset Construction
Construction
skin. Level 2 is the white ripe stage, where the fruit begins to change from green to white,
The experimental
Thered
experimental data
datatofor
for this
this study
study were
were collected
collected from
from 3the
the China
China Israel
Israel Demon-
Demon-
with pale spots starting appear in some areas, and Level is the color-changing
stration
stration Farm
Farm polytunnel
polytunnel greenhouse
greenhouse of
of the
the Fujian
Fujian Academy
Academy of
of Agricultural
Agricultural Sciences.
Sciences. The
The
stage, where the color change of the fruit becomes more pronounced, with the red coloring
strawberry
strawberry varieties
varieties targeted
targeted in this
in of
this experiment included Hongyan, Xiangye, and Yuexiu,
starting to spread and cover more theexperiment included
fruit surface, Hongyan,
although Xiangye,
some areas remainand Yuexiu,
white or
among
among
pale red. others.
others.
2.2.1.
2.2.1.
2.2. ImageStrawberry
Collection Image
Strawberry Image Acquisition
Acquisition
and Dataset Construction
Strawberry
TheStrawberry image
image
experimental collection
collection
data ensured
ensured
for this study that
that
were the
the dataset
dataset
collected fromreflected
reflected
the China the actual
theIsrael planting
planting en-
actualDemonstra- en-
vironment
tion characteristics,
Farm polytunnel
vironment such
such as
greenhouse
characteristics, asofirregular distribution,
the Fujian
irregular Academy uneven
distribution, lighting,
of Agricultural
uneven and
and mutual
lighting,Sciences.
mutualTheoc-
oc-
clusion
clusion between
strawberry varietiesleaves
between and
targeted
leaves andinfruits. The
The collection
this experiment
fruits. times
times were
included
collection set
set from
Hongyan,
were from 8:00
8:00 AM
Xiangye,AM to
and 5:00
5:00 PM,
toYuexiu,
PM,
among others.
a .txt file in YOLO format. The dataset was divided into training and test sets at a 4:1 ratio,
Agronomy 2024, 14, 1353 5 of 14
aresulting
.txt file ininYOLO
2177 images
format.for
Thetraining
datasetand
was545 for testing.
divided into training and test sets at a 4:1 ratio,
resulting in 2177 images for training and 545 for testing.
Figure4.4.Improved
Figure Improvednetwork
network structure
structure diagram.
diagram.
2.3.1. ConvNeXt
Figure 4. ImprovedV2 Module
network structure diagram.
ConvNeXt V2, introduced by Sanghyun Woo and others [18], is a novel convolutional
neural network architecture that incorporates a fully convolutional masked autoencoder
(FCMEA) and a lightweight ConvNeXt decoder, as shown in Figure 5. The encoder uses
sparse convolutions to process only the visible parts of the input, reducing the pretraining
computational costs and allowing the model to use the remaining contextual information
to predict missing parts, thus enhancing its ability to learn and understand visual data.
Additionally, a global response normalization (GRN) layer is introduced in the convolu-
Agronomy 2024, 14, 1353
tional network to enhance feature competition between the channels. The GRN enhances
6 of 14
feature contrast, and through selective steps of global feature aggregation, normalization,
and calibration, it helps prevent feature collapse and thereby improves the model’s ex-
pressive and generalization capabilities [19]. This module enhances the performance of
sparse convolutions to process only the visible parts of the input, reducing the pretraining
pure convolutional neural networks in various downstream tasks, and the module struc-
computational costs and allowing the model to use the remaining contextual information
ture is shown in Figure 6.
to predict missing parts, thus enhancing its ability to learn and understand visual data.
In the detection of strawberry ripeness, ConvNeXt V2 randomly masks parts of the
Additionally, a global response normalization (GRN) layer is introduced in the convolu-
strawberry image. Through processing with sparse convolution, it predicts the masked
tional network to enhance feature competition between the channels. The GRN enhances
areas to capture details within the strawberry image, accurately capturing features while
feature contrast, and through selective steps of global feature aggregation, normalization,
reducing computational costs without sacrificing performance. Concurrently, the GRN
and calibration, it helps prevent feature collapse and thereby improves the model’s expres-
layer
siveenhances the competition
and generalization among[19].
capabilities feature
Thischannels, helping the
module enhances themodel better distin-
performance of pure
guish subtle differences
convolutional betweeninstrawberries
neural networks of different
various downstream maturities,
tasks, thus improving
and the module structure is
recognition accuracy.
shown in Figure 6.
Figure
Figure 5. 5. FCMAE
FCMAE full
full convolutional
convolutional mask
mask autoencoder.
autoencoder.
Figure6.6.ConvNeXt
Figure ConvNeXtV2
V2module.
module.
2.3.2.
InECA Attentionof
the detection Mechanism
strawberry ripeness, ConvNeXt V2 randomly masks parts of the
strawberry image. Through processing with
Attention mechanisms dynamically sparse
adjust convolution,
the weights of theitinput
predicts the masked
features within a
areas to capture
network details within
[20], enabling the strawberry
better perception of theimage, accurately
distinctive capturing
features features
in images while
and facilitat-
reducing computational costs without sacrificing performance. Concurrently,
ing rapid target localization. This mechanism has been widely adopted in computer the GRN vi-
layer enhances the competition among feature channels, helping the model
sion. The efficient channel attention (ECA) module (Figure 7) avoids the dimension reduc- better dis-
tinguish
tion foundsubtle differences
in the between strawberries
squeeze-and-excitation of different
(SE) module. maturities,
It learns thus improving
channel attention directly
recognition accuracy.
after global average pooling using a one-dimensional convolution, maintaining the di-
mensionality of the channels [21]. A key feature of the ECA module is its adaptive method
for determining the size (k) of the one-dimensional convolutional kernel, which aligns the
local cross-channel interaction range with the channel dimensions, facilitating efficient
learning without manual adjustments. Due to its light weight and minimal additional pa-
rameters, the ECA module significantly reduces model complexity while maintaining per-
formance.
In this study, an ECA attention mechanism was added above the SPPF layer of the
backbone network. The ECA attention mechanism avoids dimensionality reduction, pre-
network [20], enabling better perception of the distinctive features in images and facilitat-
ing rapid target localization. This mechanism has been widely adopted in computer vi-
sion. The efficient channel attention (ECA) module (Figure 7) avoids the dimension reduc-
tion found in the squeeze-and-excitation (SE) module. It learns channel attention directly
Agronomy 2024, 14, 1353 after global average pooling using a one-dimensional convolution, maintaining the di-
7 of 14
mensionality of the channels [21]. A key feature of the ECA module is its adaptive method
for determining the size (k) of the one-dimensional convolutional kernel, which aligns the
local
2.3.2. cross-channel
ECA Attentioninteraction
Mechanism range with the channel dimensions, facilitating efficient
learning without manual adjustments. Due to its light weight and minimal additional pa-
Attention mechanisms dynamically adjust the weights of the input features within a
rameters, the ECA module significantly reduces model complexity while maintaining per-
network [20], enabling better perception of the distinctive features in images and facilitating
formance.
rapid target localization. This mechanism has been widely adopted in computer vision.
In this study, an ECA attention mechanism was added above the SPPF layer of the
The efficient channel attention (ECA) module (Figure 7) avoids the dimension reduction
backbone network. The ECA attention mechanism avoids dimensionality reduction, pre-
found in the squeeze-and-excitation (SE) module. It learns channel attention directly after
serving more original feature information of strawberries at different maturity levels,
global average pooling using a one-dimensional convolution, maintaining the dimension-
thereby enhancing feature-representation capabilities. Local interactions of one-dimen-
ality of the channels [21]. A key feature of the ECA module is its adaptive method for
sional convolution enable the model to focus more on key feature areas related to maturity
determining the size (k) of the one-dimensional convolutional kernel, which aligns the local
and automatically adjust the range of the receptive field based on different feature layers,
cross-channel interaction range with the channel dimensions, facilitating efficient learning
allowing the model to flexibly handle changes in the strawberry-ripening process.
without manual adjustments. Due to its light weight and minimal additional parameters,
the ECA module significantly reduces model complexity while maintaining performance.
Figure
Figure 7.
7. ECA
ECAattention
attentionmechanism
mechanism structure.
structure.
In this study, an ECA attention mechanism was added above the SPPF layer of the back-
bone network. The ECA attention mechanism avoids dimensionality reduction, preserving
more original feature information of strawberries at different maturity levels, thereby
enhancing feature-representation capabilities. Local interactions of one-dimensional con-
volution enable the model to focus more on key feature areas related to maturity and
automatically adjust the range of the receptive field based on different feature layers,
allowing the model to flexibly handle changes in the strawberry-ripening process.
Figure8.8.Schematic
Figure Schematicdiagram
diagramof
ofSIoU
SIoUloss
lossfunction.
function.
2.4.
2.4.Model
ModelEvaluation
EvaluationMetrics
Metrics
The
Theprocess
processused
usedto
todetermine
determinethetheripeness
ripenessof
ofstrawberry
strawberryfruits
fruitsmust
mustconsider
considerboth
both
detection accuracy and performance. For model detection accuracy, precision, recall,
detection accuracy and performance. For model detection accuracy, precision, recall, and and
the
theF1
F1score
scoreare
areused
usedasasevaluation
evaluationmetrics.
metrics.For
Formodel
modelperformance,
performance,thethemAP50,
mAP50,which
whichisis
the
the mean average precision at a threshold of 50%, was selected as the evaluation metric.
mean average precision at a threshold of 50%, was selected as the evaluation metric.
The
Theformula
formulaisisas
asfollows:
follows:
TPTP
Precision
Precision == ××100%
100% (1)
(1)
TPTP
++ FPFP
TP
Recall = = TP ××100%
Recall 100% (2)
(2)
TPTP++ FNFN
2 ∗ Precision ∗ Recall
F1 = 2 ∗ Precision ∗ Recall × 100% (3)
F1 =(Precision + Recall) × 100% (3)
(Precision + Recall)
k
∑ P×R
AP = 1 ×1 (4)
∑ KP × R
AP = k ×1 (4)
∑1 APK
mAP = ×1 (5)
k
∑ AP
TP (true positives) represents the number ×1
mAPof=actual positive (5)
samples predicted as positive.
k
FP (false positives) represents the number of actual negative samples predicted as positive.
TP (false
FN (true positives)
negatives) represents
represents the
the number
number ofof actual
actual positive
positive samples
samplespredicted
predictedas
asnegative.
positive.
FP (false positives) represents the number of actual negative samples predicted as posi-
TN (true negatives) represents the number of actual negative samples predicted as negative.
tive. FN (false negatives) represents the number of actual positive samples predicted as
3. Experiments and Result Analysis
3.1. Experimental Environment Configuration and Network Parameters
The training and testing of this study’s model were performed on a computer equipped
with an Intel Core i7-13700K CPU at 3.4 GHz, 32 GB of RAM, and a Windows 10 (64-bit)
operating system accelerated by a GeForce RTX 4070 Ti GPU with 12 GB of VRAM. The
programming language used was Python 3.8.10, the deep learning framework was PyTorch
1.2.0, and the OpenCV version was 4.8. The initial learning rate was set to 0.001 to balance
the model’s convergence speed and learning efficiency, preventing instability due to rapid
convergence. Additionally, a momentum decay strategy was employed, with a value set
to 0.937, to speed up the learning process and avoid local minima. Finally, to enhance the
model’s generalization capability, a weight decay of 0.0005 was set, which helped reduce
the risk of overfitting.
After the SIoU loss function was adopted, there were slight increases in the mAP50
and F1 values, with the greatest improvement in precision being 2.5%. This indicates that
the SIoU loss function, through more refined bounding box regression optimization, helped
the model achieve better performance. The application of the ECA attention mechanism
Agronomy 2024, 14, 1353 10 of 14
also improved the model’s precision, recall, and mAP50, demonstrating that the enhance-
ment of the attention mechanism strengthened the feature expression of specific channels,
significantly improving the accurate positioning and recognition of targets. Replacing
the backbone network with ConvNeXt V2 increased the precision, recall, mAP50, and F1
values by 3.2%, 0.9%, 0.4%, and 2.09%, respectively. This significant improvement indicates
that ConvNeXt V2, compared to the original YOLOv8 structure, better captures image
features and enhances model performance. Combining ConvNeXt V2, ECA, and SIoU
allowed the model’s performance to reach the highest values among all the metrics, with
precision, recall, mAP50, and F1 values of 88.20%, 89.80%, 92.10%, and 88.99%, respectively,
indicating increases of 4.8%, 2.9%, 2.05%, and 3.88%.
This study also calculated the frames per second (FPS) with the test dataset. The results
show that the original YOLOv8 had an FPS of 220.92. The introduction of the ECA attention
mechanism and SIoU loss function improved the processing speed, but the inclusion of
ConvNeXt V2 in the backbone network increased the image features and computational
time, leading to a noticeable decrease in FPS. Despite this, the FPS of the improved model
still reached 184.52, far exceeding the requirements for real-time processing applications
(over 30 FPS).
In summary, when all improvements are combined, their effects are further amplified,
significantly enhancing the overall performance of strawberry ripeness detection without
sacrificing real-time processing capabilities.
Model Size
Model Precision Recall mAP50
/M
YOLOv5 86.50% 89.40% 80.26% 3.74
YOLOv7 77.10% 83.70% 80.26% 71.3
Retinanet 64.42% 92.59% 80.26% 139
Faster RCNN 65.26% 89.22% 80.26% 108
YOLOv8 85.90% 86.60% 86.25% 5.91
CES-YOLOv8 88.20% 89.80% 88.99% 41.3
Some of the inference results from the different models are shown in Figure 10, where
the red arrows indicate missed detections, the blue arrows indicate false detections, and
the yellow arrows indicate duplicate detections. The results show that the CES-YOLOv8
model surpasses other models in accurately identifying occluded targets and reducing false
positives and false negatives. YOLOv8 and YOLOv5 have significant issues with missed
detections in occluded strawberries. Although YOLOv7 has a lower miss rate, it suffers
from multiple duplicate detections, which could affect accuracy and increase the detection
time. Faster R-CNN and RetinaNet perform poorly in terms of accuracy and false detections,
especially RetinaNet, which has numerous duplicate detection issues, making it impractical
for real-world applications. Overall, CES-YOLOv8 is advantageous for reducing common
issues in agricultural production applications, such as fruit and leaf occlusions, and it has
significantly improved accuracy in fruit identification and positioning.
Retinanet 64.42% 92.59% 80.26% 139
Faster
65.26% 89.22% 80.26% 108
RCNN
YOLOv8 85.90% 86.60% 86.25% 5.91
Agronomy 2024, 14, 1353 CES- 11 of 14
88.20% 89.80% 88.99% 41.3
YOLOv8
4.4.Discussion
Discussion
As the global population increases and land resources become increasingly scarce, im-
proving agricultural production efficiency has become particularly important. Automated
harvesting robot technology, as a key advancement in smart agriculture, holds significant
value in increasing harvesting efficiency and reducing labor costs [25]. Traditional methods
for determining fruit ripeness are limited, primarily relying on a simple recognition of color
and size, making it difficult to adapt to variable growing conditions and the significant
color differences in fruits during the ripening process. Existing studies have proposed mul-
tiple solutions, but there are significant environmental dependency issues and difficulties
in balancing model accuracy and efficiency [26]. Therefore, developing an efficient and
accurate algorithm for detecting strawberry ripeness is crucial.
In response to these shortcomings, in this study, a CES-YOLOv8 network model based
on improvements to YOLOv8 was proposed; it enhances the accuracy and robustness
of strawberry ripeness recognition while balancing the model’s use of computational
resources. During the data collection phase, the impacts of different lighting conditions,
varying levels of occlusion, and different angles on image acquisition were comprehensively
considered, greatly enhancing the model’s adaptability and robustness in real agricultural
production environments. Additionally, by replacing some C2f modules of the backbone
layer with ConvNeXt V2 modules and introducing ECA attention in the layer above
the SPFF, the model’s feature diversity and generalization capability were effectively
enhanced, improving performance while reducing memory resource usage and increasing
the accuracy of fruit detection in complex environments. The experimental results show
that the improved network achieved significant performance enhancements in strawberry
ripeness recognition tasks, with an accuracy of 88.2%, a recall of 89.8%, an mAP50 of 92.10%,
and an F1 score of 88.99%, representing improvements of 4.8%, 2.9%, 2.05%, and 3.88%,
respectively, over the corresponding values of the original YOLOv8 network.
The improvements in precision and recall mean that the model is more accurate in
detecting the ripeness of strawberries, reducing misidentification. In practical applications,
this can prevent fruits from being harvested at the wrong time, ensuring product quality
and market value. A higher mAP50 indicates that the model maintains high performance
in real-time dynamic environments. A higher F1 score ensures that all ripe strawberries are
correctly classified, which is important for actual production. As the model is improved,
the increase in feature computation also leads to a reduction in FPS, but it still far surpasses
Agronomy 2024, 14, 1353 12 of 14
the need for real-time processing. An FPS of 184.52 supports the rapid scanning of fruits
using robots without slowing down production due to the image processing speed. Overall,
the improved model enhances various performance metrics without sacrificing real-time
processing capabilities. These results not only validate the effectiveness of the improve-
ments but also demonstrate that the methods proposed in this study can accurately identify
strawberry ripeness in complex environments, significantly advancing the development of
automated harvesting technologies.
However, this study has certain limitations. First, although the model in this study
performs well in detecting the ripeness of strawberries, its generality and applicability to
other types of fruits or crops need further verification. Second, considering the complexity
of agricultural production, such as the impact of climate conditions and soil types in
different regions on fruit ripeness, it is necessary to explore the adaptability and robustness
of the model under more diverse conditions [27]. Additionally, although the experiment
considered various issues such as different lighting conditions, degrees of occlusion, and
angles, it overlooked more physiological details; future work can further investigate the
specific mechanisms through which these factors affect model performance and how to
further optimize the model in order to address these challenges.
To address the aforementioned shortcomings, future research will further explore
the generalization capabilities of the model, especially for different types of fruits, and
for ripeness detection at various growth stages. Moreover, given the complexities of
actual agricultural production, future research should focus more on the adaptability and
robustness of the model under real field conditions, including its response to different
climatic conditions and pest impacts. Through continuous optimization and improvement,
more technical support for the development of smart agriculture and automated harvesting
technologies can be provided, contributing to the enhancement of agricultural production
efficiency and sustainable development.
5. Conclusions
Addressing the current difficulty of balancing model accuracy and performance in
ripeness detection using automated harvesting robots, this study focused on strawberries
and proposed an improved CES-YOLOv8 network structure. During the data collection
phase, the effects of different lighting conditions, degrees of occlusion, and angles were
considered, and image data covering these scenarios were collected, effectively enhanc-
ing the model’s applicability and robustness in real agricultural environments. Targeted
improvements were made to the YOLOv8 object detection network, including the replace-
ment of some C2f modules in the backbone layer with ConvNeXt V2 modules and the
introduction of ECA attention in the layer above the SPFF. The improvements enhanced
the model’s feature diversity and generalization ability, boosting its performance. The
model’s accuracy, recall, mAP50, and F1 score reached 88.20%, 89.80%, 92.10%, and 88.99%,
respectively, showing increases of 4.8%, 2.9%, 2.05%, and 3.88%, respectively, compared to
the corresponding values of the initial YOLOv8 structure. While improving the accuracy
and precision of strawberry ripeness detection, the enhancements also effectively reduced
the problems of missed and duplicate detections. This study provides an efficient and
precise ripeness detection technology for automated harvesting robots in the field of smart
agriculture, which advances the field of smart agriculture, enhances agricultural production
efficiency, and supports sustainable agricultural development.
Funding: This research was funded through the following grant: Key Technology for Digitization of
Characteristic Agricultural Industries in Fujian Province (XTCXGC2021015).
Data Availability Statement: Since the project presented in this research has not yet concluded, the
experimental data will not be disclosed for the time being. Should readers require any supporting
information, they may contact the corresponding author via email.
Conflicts of Interest: The authors declare no conflicts of interest. The funders had no role in the design
of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or
in the decision to publish the results.
References
1. Rehman, A.; Saba, T.; Kashif, M.; Fati, S.M.; Bahaj, S.A.; Chaudhry, H. A revisit of internet of things technologies for monitoring
and control strategies in smart agriculture. Agronomy 2022, 12, 127. [CrossRef]
2. Yamamoto, S.; Hayashi, S.; Yoshida, H.; Kobayashi, K. Development of a stationary robotic strawberry harvester with a picking
mechanism that approaches the target fruit from below. Jpn. Agric. Res. Q. 2014, 48, 261–269. [CrossRef]
3. Hayashi, S.; Yamamoto, S.; Saito, S.; Ochiai, Y.; Kamata, J.; Kurita, M.; Yamamoto, K. Field operation of a movable strawberry-
harvesting robot using a travel platform. Jpn. Agric. Res. Q. 2014, 48, 307–316. [CrossRef]
4. Kaur, H.; Sawhney, B.K.; Jawandha, S.K. Evaluation of plum fruit maturity by image processing techniques. J. Food Sci. Technol.
2018, 55, 3008–3015. [CrossRef] [PubMed]
5. Villaseñor-Aguilar, M.J.; Botello-Álvarez, J.E.; Pérez-Pinal, F.J.; Cano-Lara, M.; León-Galván, M.F.; Bravo-Sánchez, M.-G.; Barranco-
Gutierrez, A.I. Fuzzy classification of the maturity of the tomato using a vision system. J. Sens. 2019, 2019, 3175848. [CrossRef]
6. Yin, Y.; Guo, C.; Shi, H.; Zhao, J.; Ma, F.; An, W.; He, X.; Luo, Q.; Cao, Y.; Zhan, X. Genome-wide comparative analysis of the
R2R3-MYB gene family in five solanaceae species and identification of members regulating carotenoid biosynthesis in wolfberry.
Int. J. Mol. Sci. 2022, 23, 2259. [CrossRef] [PubMed]
7. Parvathi, S.; Selvi, S.T. Detection of maturity stages of coconuts in complex background using Faster R-CNN model. Biosyst. Eng.
2021, 202, 119–132. [CrossRef]
8. Wang, L.M.; Jiang, Y. Automatic grading of banana ripeness based on deep learning. Food Mach. 2022, 38, 149–154. [CrossRef]
9. Wang, Z.; Ling, Y.; Wang, X.; Meng, D.; Nie, L.; An, G.; Wang, X. An improved Faster R-CNN model for multi-object tomato
maturity detection in complex scenarios. Ecol. Inform. 2022, 72, 101886. [CrossRef]
10. Chen, F.; Zhang, X.; Zhu, X.; Li, Z.; Lin, J. Detection of olive fruit maturity based on improved EfficientDet. Trans. Chin. Soc. Agric.
Eng. 2022, 38, 158–166.
11. Wang, C.; Wang, C.; Wang, L.; Wang, J.; Liao, J.; Li, Y.; Lan, Y. A lightweight cherry tomato maturity real-time detection algorithm
based on improved YOLOV5n. Agronomy 2023, 13, 2106. [CrossRef]
12. Kazama, E.H.; Tedesco, D.; Carreira, V.d.S.; Júnior, M.B.; de Oliveira, M.F.; Ferreira, F.M.; Junior, W.M.; da Silva, R.P. Monitoring
coffee fruit maturity using an enhanced convolutional neural network under different image acquisition settings. Sci. Hortic.
2024, 328, 112957. [CrossRef]
13. Megalingam, R.K.; Manoharan, S.K.; Maruthababu, R.B. Integrated fuzzy and deep learning model for identification of coconut
maturity without human intervention. Neural Comput. Appl. 2024, 1–13. [CrossRef]
14. Zhang, W.; Liu, Y.; Chen, K.; Li, H.; Duan, Y.; Wu, W.; Shi, Y.; Guo, W. Lightweight fruit-detection algorithm for edge computing
applications. Front. Plant Sci. 2021, 12, 740936. [CrossRef] [PubMed]
15. Xiao, Z.Q.; He, J.X.; Chen, D.B.; Zhan, Y.; Lu, Y.L. Automatic classification method of rock spectra based on twin network model.
Spectrosc. Spectr. Anal. 2024, 44, 558–562.
16. Wang, Y.T.; Zhou, H.Q.; Yan, J.X.; He, C.; Huang, L.L. Progress in computational optics research based on deep learning algorithms.
Chin. J. Lasers 2021, 48, 1918004.
17. Zhao, J.D.; Zhen, G.Y.; Chu, C.Q. Drone image target detection algorithm based on YOLOv8. Comput. Eng. 2024, 50, 113–120.
18. Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked
autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada,
17–24 June 2023; pp. 16133–16142.
19. Li, Y.; He, Z.; Ma, J.; Zhang, Z.; Zhang, W.; Chatterjee, P.; Pamucar, D. A Novel Feature Aggregation Approach for Image Retrieval
Using Local and Global Features. CMES-Comput. Model. Eng. Sci. 2022, 131, 239–262. [CrossRef]
20. Zhu, M.L.; Ren, Y.Z. Screw surface defect detection based on neural networks. J. Ordnance Equip. Eng. 2024, 45, 224–231.
21. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 11534–11542.
22. Li, G.M.; Gong, H.B.; Yuan, K. Research on Sichuan pepper cluster detection based on lightweight YOLOv5s. Chin. J. Agric. Mech.
2023, 44, 153.
23. Gevorgyan, Z. SIoU Loss: More Powerful Learning for Bounding Box Regression, 23 May 2022. Available online: https:
//arxiv.org/abs/2205.12740 (accessed on 16 April 2024).
Agronomy 2024, 14, 1353 14 of 14
24. Gu, Z.; Zhu, K.; You, S. YOLO-SSFS: A Method Combining SPD-Conv/STDL/IM-FPN/SIoU for Outdoor Small Target Vehicle
Detection. Electronics 2023, 12, 3744. [CrossRef]
25. Raja, V.; Bhaskaran, B.; Nagaraj, K.; Sampathkumar, J.; Senthilkumar, S. Agricultural harvesting using integrated robot system.
Indones. J. Electr. Eng. Comput. Sci. 2022, 25, 152. [CrossRef]
26. Yoshida, T.; Onishi, Y.; Kawahara, T.; Fukao, T. Automated harvesting by a dual-arm fruit harvesting robot. Robomech J. 2022,
9, 19. [CrossRef]
27. Vincent, D.R.; Deepa, N.; Elavarasan, D.; Srinivasan, K.; Chauhdary, S.H.; Iwendi, C. Sensors driven AI-based agriculture
recommendation model for assessing land suitability. Sensors 2019, 19, 3667. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.