0% found this document useful (0 votes)
7 views14 pages

CES-YOLOv8 Strawberry Maturity Detection Based On

The document presents the CES-YOLOv8 algorithm, an improved version of the YOLOv8 network, designed for detecting strawberry maturity to enhance automated harvesting efficiency. The study demonstrates that the CES-YOLOv8 model achieves significant improvements in accuracy and performance metrics compared to the original YOLOv8, making it suitable for complex agricultural environments. This algorithm not only supports automated harvesting but also has potential applications in other fruit crops, contributing to advancements in smart agriculture.

Uploaded by

aranganathanak47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

CES-YOLOv8 Strawberry Maturity Detection Based On

The document presents the CES-YOLOv8 algorithm, an improved version of the YOLOv8 network, designed for detecting strawberry maturity to enhance automated harvesting efficiency. The study demonstrates that the CES-YOLOv8 model achieves significant improvements in accuracy and performance metrics compared to the original YOLOv8, making it suitable for complex agricultural environments. This algorithm not only supports automated harvesting but also has potential applications in other fruit crops, contributing to advancements in smart agriculture.

Uploaded by

aranganathanak47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

agronomy

Article
CES-YOLOv8: Strawberry Maturity Detection Based on the
Improved YOLOv8
Yongkuai Chen 1,† , Haobin Xu 1,2,† , Pengyan Chang 1 , Yuyan Huang 1 , Fenglin Zhong 2 , Qi Jia 4 , Lingxiao Chen 5 ,
Huaiqin Zhong 3, * and Shuang Liu 2, *

1 Institute of Digital Agriculture, Fujian Academy of Agricultural Sciences, Fuzhou 350003, China;
[email protected] (Y.C.); [email protected] (H.X.); [email protected] (P.C.);
[email protected] (Y.H.)
2 College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China;
[email protected]
3 Crops Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou 350003, China
4 Jiuquan Academy of Agriculture Sciences, Jiuquan 735099, China; [email protected]
5 Fujian Agricultural Machinery Extension Station, Fuzhou 350002, China; [email protected]
* Correspondence: [email protected] (H.Z.); [email protected] (S.L.)
† These authors contributed to the work equally and should be regarded as co-first authors.

Abstract: Automatic harvesting robots are crucial for enhancing agricultural productivity, and
precise fruit maturity detection is a fundamental and core technology for efficient and accurate
harvesting. Strawberries are distributed irregularly, and their images contain a wealth of characteristic
information. This characteristic information includes both simple and intuitive features, as well as
deeper abstract meanings. These complex features pose significant challenges to robots in determining
fruit ripeness. To increase the precision, accuracy, and efficiency of robotic fruit maturity detection
methods, a strawberry maturity detection algorithm based on an improved CES-YOLOv8 network
structure from YOLOv8 was developed in this study. Initially, to reflect the characteristics of actual
planting environments, the study collected image data under various lighting conditions, degrees of
occlusion, and angles during the data collection phase. Subsequently, parts of the C2f module in the
YOLOv8 model’s backbone were replaced with the ConvNeXt V2 module to enhance the capture
Citation: Chen, Y.; Xu, H.; Chang, P.; of features in strawberries of varying ripeness, and the ECA attention mechanism was introduced
Huang, Y.; Zhong, F.; Jia, Q.; Chen, L.; to further improve feature representation capability. Finally, the angle compensation and distance
Zhong, H.; Liu, S. CES-YOLOv8: compensation of the SIoU loss function were employed to enhance the IoU, enabling the rapid
Strawberry Maturity Detection Based
localization of the model’s prediction boxes. The experimental results show that the improved CES-
on the Improved YOLOv8. Agronomy
YOLOv8 model achieves an accuracy, recall rate, mAP50, and F1 score of 88.20%, 89.80%, 92.10%, and
2024, 14, 1353. https://doi.org/
88.99%, respectively, in complex environments, indicating improvements of 4.8%, 2.9%, 2.05%, and
10.3390/agronomy14071353
3.88%, respectively, over those of the original YOLOv8 network. This algorithm provides technical
Academic Editors: Yanbo Huang support for automated harvesting robots to achieve efficient and precise automated harvesting.
and Alberto San Bautista
Additionally, the algorithm is adaptable and can be extended to other fruit crops.
Received: 11 May 2024
Revised: 26 May 2024 Keywords: automatic harvesting robots; CES-YOLOv8; strawberry maturity
Accepted: 20 June 2024
Published: 22 June 2024

1. Introduction
With the dual pressures of global population growth and a gradual reduction in arable
Copyright: © 2024 by the authors.
land, increasing agricultural production has become an important societal challenge. The
Licensee MDPI, Basel, Switzerland.
implementation of smart agriculture is a key solution to this challenge, in which the use of
This article is an open access article
distributed under the terms and
digital information technology and intelligent equipment is crucial for achieving efficient
conditions of the Creative Commons
and sustainable agricultural development [1]. Among the many applications of smart
Attribution (CC BY) license (https:// agriculture, automated harvesting robot technology can replace manual labor, significantly
creativecommons.org/licenses/by/ increasing harvesting efficiency, which is especially important in regions with high labor
4.0/).

Agronomy 2024, 14, 1353. https://doi.org/10.3390/agronomy14071353 https://www.mdpi.com/journal/agronomy


Agronomy 2024, 14, 1353 2 of 14

costs or labor shortages. Fruit maturity detection is a fundamental and critical technology
for the efficient and accurate performance of automated harvesting robots.
Traditional automated harvesting systems mostly rely on simple color and size recogni-
tion for determining fruit maturity. Yamamoto and others proposed an algorithm based on
color threshold segmentation to isolate strawberry targets [2]. Hayashi and others designed
a strawberry-harvesting robot that also uses a color threshold segmentation algorithm to
detect strawberries and estimate maturity [3]. Kaur and others utilized external quality fea-
tures such as color, texture, and size to detect the maturity of plums [4]. Villaseñor-Aguilar
and others proposed a new fuzzy classification framework based on the RGB color model
to categorize the maturity of tomatoes [5]. Although these methods have resolved the
maturity detection issue to some extent, they have stringent requirements for the detection
environment and growth conditions. However, in actual production, the fruit maturation
process is influenced by many factors, such as the fruit variety and growth conditions (such
as light and humidity), which affect the color at maturity. Moreover, the color of the fruit
may also change due to shading, pests, and diseases, among other reasons [6], which can
affect color recognition accuracy.
By automatically learning the intrinsic connections and patterns within annotated
datasets, deep learning technologies can be used to effectively extract deep features from
images; they exhibit especially high accuracy and rapid identification in complex scene
target detection and classification. In recent years, deep learning technologies have been
rapidly integrated into various agricultural research fields, including fruit maturity de-
tection in complex environments. Subramanian Parvathi and others have improved the
region-based Faster R-CNN model for detecting the maturity of coconuts against complex
backgrounds [7]. Wang Lingmin and colleagues utilized an enhanced AlexNet model
to classify the maturity of bananas, achieving an accuracy of 96.67% [8]. Zan Wang and
associates designed an improved Faster R-CNN model, MatDet, for the detection of tomato
maturity. Experimental results indicate that, in complex scenes, the proposed model
achieved optimal detection results under conditions of branch occlusion, fruit overlapping,
and lighting effects, with a mean average precision (mAP) of 96.14% [9]. Chen Fengjun
and colleagues proposed an improved method for detecting the maturity of olive fruits
using EfficientDet, and the model’s precision P, recall rate R, and mean average precision
mAP in the test set were 92.89%, 93.59%, and 94.60%, respectively [10]. Wang Congyue
and others introduced an enhanced object detection algorithm based on YOLOv5n for the
real-time identification and maturity detection of cherry tomatoes, achieving an average
accuracy of 95.2% [11]. Elizabeth Haruna Kazama and others used an enhanced YOLOv8
model modified through convolution blocks (RFCAConv) to classify the maturity stages of
coffee fruits, with the model reaching an [email protected] of 74.20% [12]. Megalingam, Rajesh
Kannan, and colleagues proposed an integrated fuzzy deep learning model (IFDM) for
classifying the maturity levels of coconuts. The study showed that the real-time learning
model achieved an accuracy of 86.3% in classifying coconut maturity levels [13]. Currently,
fruit maturity detection methods based on convolutional neural networks have rapidly
developed, yet issues remain. Methods with high detection accuracy often have high com-
putational complexity and slow detection speeds, while methods that are computationally
simpler and faster tend to have lower accuracy [14].
To address the aforementioned issues, this study used strawberries as its research
subject and basis for improvements on the YOLOv8 object detection network, propos-
ing a novel strawberry ripeness detection algorithm named CES-YOLOv8 to enhance the
accuracy of ripeness detection. The algorithm enhances the accuracy and robustness of
strawberry ripeness recognition via automated harvesting robots under various environ-
mental conditions without sacrificing real-time processing capabilities, providing technical
support for efficient and precise automated harvesting. This research not only helps en-
hance the practicality and economic benefits of automated harvesting technology but also
offers technical references for smart agriculture in precision agricultural management,
harvesting, and sorting.
2.
2. Materials
Materials and and Methods
Methods
2.1.
2.1. Classification
Classification of of Strawberry
Strawberry Ripeness
Ripeness
In
In this
this study,
study, strawberry
strawberry ripeness
ripeness was
was classified
classified into
into four
four levels
levels based
based onon the
the growth
growth
Agronomy 2024, 14, 1353 3 of 14
and color changes of the fruit, as shown in
and color changes of the fruit, as shown in Table 1. Table 1.
Level
Level 11 isis the
the unripe
unripe stage,
stage, characterized
characterized by by hard
hard flesh
flesh and
and aa green
green surface
surface on
on the
the
skin.
skin. Level 2 is the white ripe stage, where the fruit begins to change from green to white,
Level 2 is the white ripe stage, where the fruit begins to change from green to white,
2. with
Materials red and Methods
with pale
pale red spots
spots starting
starting to
to appear
appear inin some
some areas,
areas, and
and Level
Level 33 is
is the
the color-changing
color-changing
2.1. Classification
stage, of Strawberry Ripeness
stage, where
where thethe color
color change
change ofof the
the fruit
fruit becomes
becomes more
more pronounced,
pronounced, withwith the
the red
red coloring
coloring
In this
starting study, strawberry ripeness was classified into four levels based
starting to spread and cover more of the fruit surface, although some areas remain white
to spread and cover more of the fruit surface, although some on
areas the growth
remain white
andorcolor
or pale changes of the fruit, as shown in Table 1.
pale red.
red.

Table
Table
Table 1.
1. Classification
1. Classification of of
Classification of strawberry
strawberry maturity
maturity
strawberry levels.
levels.
maturity levels.
Grade
Grade Label
Label
Grade Label Description
Description
Description Image
Image
Image

11 Immature_stage
Immature_stage
1 Immature_stage Fruit
Fruit remains
remains green
green
Fruit remains green

Fruit
Fruit begins
begins to change
Fruittobegins
change from
from green
to change green
from
22 Mature_white_stage
Mature_white_stage
2 to
to white;
Mature_white_stagewhite; some
some
green varieties
varieties
to white; somestart
start to
to
varieties
show
startlight
to red
show spots
light
show light red spots red spots

Red
Red starts
starts to
to spread
Red startsspread and cover
and and
to spread cover
cover
more
more of
of the
the
morefruit
fruit
of surface,
surface,
the fruit but
but there
there
surface, but
33 Color_turning_stage
3 Color_turning_stage
Color_turning_stage are
are still areas
still there
areasarethat
that are
still
are white
areas thator
white are
or
white or light red
light
light red
red

The colorThe
of color
the strawberries uni-
of the strawberries
44 Ripe_stage
4
Ripe_stage Ripe_stage The color of the strawberries uni-
formly turns bright red red
uniformly
formly turns turns
bright bright
red
formly turns bright red

2.2.Level 1Collection
is the unripe Dataset
stage, characterized by hard flesh and a green surface on the
2.2. Image
Image Collection and
and Dataset Construction
Construction
skin. Level 2 is the white ripe stage, where the fruit begins to change from green to white,
The experimental
Thered
experimental data
datatofor
for this
this study
study were
were collected
collected from
from 3the
the China
China Israel
Israel Demon-
Demon-
with pale spots starting appear in some areas, and Level is the color-changing
stration
stration Farm
Farm polytunnel
polytunnel greenhouse
greenhouse of
of the
the Fujian
Fujian Academy
Academy of
of Agricultural
Agricultural Sciences.
Sciences. The
The
stage, where the color change of the fruit becomes more pronounced, with the red coloring
strawberry
strawberry varieties
varieties targeted
targeted in this
in of
this experiment included Hongyan, Xiangye, and Yuexiu,
starting to spread and cover more theexperiment included
fruit surface, Hongyan,
although Xiangye,
some areas remainand Yuexiu,
white or
among
among
pale red. others.
others.

2.2.1.
2.2.1.
2.2. ImageStrawberry
Collection Image
Strawberry Image Acquisition
Acquisition
and Dataset Construction
Strawberry
TheStrawberry image
image
experimental collection
collection
data ensured
ensured
for this study that
that
were the
the dataset
dataset
collected fromreflected
reflected
the China the actual
theIsrael planting
planting en-
actualDemonstra- en-
vironment
tion characteristics,
Farm polytunnel
vironment such
such as
greenhouse
characteristics, asofirregular distribution,
the Fujian
irregular Academy uneven
distribution, lighting,
of Agricultural
uneven and
and mutual
lighting,Sciences.
mutualTheoc-
oc-
clusion
clusion between
strawberry varietiesleaves
between and
targeted
leaves andinfruits. The
The collection
this experiment
fruits. times
times were
included
collection set
set from
Hongyan,
were from 8:00
8:00 AM
Xiangye,AM to
and 5:00
5:00 PM,
toYuexiu,
PM,
among others.

2.2.1. Strawberry Image Acquisition


Strawberry image collection ensured that the dataset reflected the actual planting
environment characteristics, such as irregular distribution, uneven lighting, and mutual
occlusion between leaves and fruits. The collection times were set from 8:00 A.M. to
5:00 P.M., and a Sony (A550) camera was used to capture images of strawberries from
various planting rack positions in the greenhouse without specifying the shooting angles.
All strawberry images were captured under natural light conditions, and they totaled
Agronomy 2024, 14, x FOR PEER REVIEW 4 of 14
and a Sony (A550) camera was used to capture images of strawberries from various plant-
ing rack positions in the greenhouse without specifying the shooting angles. All straw-
Agronomy 2024, 14, 1353
berry images were captured under natural light conditions, and they totaled 546 original
and a Sony (A550) camera was used to capture images of strawberries from various plant-
4 of 14
images with varying lighting conditions, degrees of occlusion, and angles, as shown in
ing rack positions in the greenhouse without specifying the shooting angles. All straw-
Figure 1.
berry images were captured under natural light conditions, and they totaled 546 original
546 original
images withimages
varyingwith varying
lighting lighting conditions,
conditions, degrees of and
degrees of occlusion, occlusion,
angles,and
as angles,
shown asin
shown in Figure 1.
Figure 1.

(a) (b) (c) (d)


Figure 1. Captured images of strawberries. (a) Dispersed, (b) with light + overlapping fruits, (c)
(a) (b) (c) (d)
leaves blocking light, and (d) leaves obstructing
Figure 1. Captured images of strawberries. (a) Dispersed, (b) with light + overlapping fruits, (c)
Figure 1. Captured images of strawberries. (a) Dispersed, (b) with light + overlapping fruits, (c) leaves
leaves blocking
2.2.2. Data light, and (d)
Enhancement leaves
and obstructing
Expansion
blocking light, and (d) leaves obstructing.
If the training sample size is insufficient during the model training process, it may
2.2.2. Data
2.2.2. Data Enhancement
Enhancement and and Expansion
Expansion
cause overfitting [15], due to which the model performs well with the training data but
If the training sample size is insufficient
insufficient during the model training process, it may
poorly with the new test data. This is due to the insufficiency of training data causing the
cause overfitting
overfitting [15], due to which the model performs well with the training data but
model to over-adapt to the characteristics of the training data and fail to generalize to
poorly with
with the
the new
newtest
testdata.
data.This
Thisis is
duedue to to
thethe
insufficiency
insufficiencyof training
of training datadata
causing the
causing
other data. To increase feature diversity in order to prevent overfitting, achieve model
model
the modelto over-adapt
to over-adaptto the characteristics
to the characteristics of the training
of the data
training andand
data failfail
to to
generalize
generalize to
convergence
other data.
during
ToTo
training, and
increase
enhance the model’s generalization ability and robust-
to other data. increasefeature
featurediversity
diversityininorderordertoto prevent
prevent overfitting,
overfitting, achieve model
ness, a series during
convergence
convergence
of datatraining,
during augmentation
training, measures
andenhance
and enhance thethe weregeneralization
model’s
model’s
implemented
generalization toability
expand
ability andand
the sample
robust-
robustness,
size. This
aness,
series study
a series employed
of dataofaugmentation a random
data augmentation
measures combination
measures of data
were implemented
were implemented augmentation
to expandtothe expand techniques
sample the sample
size. such
This
asstudy
mirroring,
size. This brightness
study
employed a randomadjustment,
employed Gaussian
a random combination
combination blur,
of data
of data augmentation contrast adjustment,
augmentation
techniques asand
techniques
such random
such
mirroring,
translation,
brightness
as mirroring, with some
adjustment,
brightnessof the
Gaussianaugmented samples
blur, contrast
adjustment, shown
Gaussianadjustment, in Figure 2.
and adjustment,
blur, contrast This approach
random translation,
and random effec-
with
tively
some expanded
translation, withthe
of the augmented
somedataset
of the size,
samples ultimately
shown
augmented in samples
Figure obtaining
2.shown 2722
This approach images
in Figure thatapproach
effectively
2. This had undergone
expanded the
effec-
enhancement
dataset
tively expanded processing.
size, ultimately obtaining
the dataset size,2722 images that
ultimately had undergone
obtaining 2722 imagesenhancement processing.
that had undergone
enhancement processing.

(a) (b) (c) (d)


(a) (b) (c) (d)
Figure
Figure2.2.Data
Dataaugmentation
augmentation effects.
effects. (a) Original image,
(a) Original image,(b)
(b)mirrored
mirroredflip,
flip,
(c)(c) contrast–brightness
contrast–brightness
Figure 2. Data augmentation effects. (a) Original image, (b) mirrored flip, (c) contrast–brightness
adjustment,
adjustment, and (d) random translation.
adjustment, and
and (d)
(d) random
random translation.
translation.
2.2.3.Data
2.2.3. DataLabeling
Labelingand and Dataset
Dataset Segmentation
Segmentation
2.2.3. Data Labeling and Dataset Segmentation
Labeling
Labeling software
software (labelImg
(labelImg v1.8.1) was
v1.8.1) was used
wasused
usedtototoannotate
annotatethe thestrawberries
strawberries with
withthethe
Labeling
following software
rules: (1) the (labelImgcircumscribing
smallest v1.8.1) annotate
rectangle was the
used strawberries
as the with
annotation the
box,
following
following rules:
rules: (1)
(1) the
the smallest
smallest circumscribingrectangle
circumscribing rectanglewas wasused
used asas the
the annotation
annotation box, box,
ensuring that the target was completely within the box and close to the boundary; (2) each
ensuring
ensuring that
that the target
the target was completely within the box and close to the boundary; (2) each
target was marked via was completely within
an independent the box
box, with no and closeoftoboxes
sharing the boundary; (2) each
among multiple
target
target was
was marked
marked via
via an
an independent
independent box,with
box, withno nosharing
sharingofofboxes
boxes among
among multiple
multiple tar-tar-
targets; (3) fruits that obscured each other in the image but did not affect the manual
gets; (3) fruits
gets; (3) fruits ofthat
that obscured
obscured each
each other
other in the image
in the image but did
but did not
not affect the manual deter-
determination ripeness levels were annotated separately; and (4)affect
fruitsthethatmanual deter-
were heavily
mination
minationofof
occluded ripeness
ripeness
(over levels
levels were
95% obscured) annotated
or annotated separately;
too distant,separately;
causing the and
and (4)fruits
(4)
main fruitsthat
body ofthat were
were
the heavily
heavily
fruits to be
occluded (over 95% obscured) or too distant, causing the main body
at the edge and severely blurred, making it difficult to discern ripeness levels, werebenot
occluded (over 95% obscured) too distant, causing the main body of of
thethe fruits
fruits to to be
at at
the
labeled, with some labeled images shown in Figure 3. The annotation results were saved in la-
theedge
edge and
and severely
severely blurred,
blurred, making
making it
it difficult
difficult to
to discern
discern ripeness
ripeness levels,
levels, were
were not not
la-
abeled,
beled, with
.txt with
file somelabeled
insome
YOLO labeled images
format.images shown
shown
The dataset wasin Figure
Figure3.3.
individed The
The
into annotationand results
annotation
training setswere
results
test a saved
atwere4:1saved in in
ratio,
resulting in 2177 images for training and 545 for testing.
Agronomy 2024, 14, x FOR PEER REVIEW 5 of 14
Agronomy 2024, 14, x FOR PEER REVIEW 5 of 14

a .txt file in YOLO format. The dataset was divided into training and test sets at a 4:1 ratio,
Agronomy 2024, 14, 1353 5 of 14
aresulting
.txt file ininYOLO
2177 images
format.for
Thetraining
datasetand
was545 for testing.
divided into training and test sets at a 4:1 ratio,
resulting in 2177 images for training and 545 for testing.

Figure 3. Annotated image.


Figure 3. Annotated image.
Figure 3. Annotated
2.3. Strawberry image.Detection Network Structure
Ripeness
In this study,
2.3. Strawberry the YOLOv8
Ripeness Detectionobject
Network detection
Structure algorithm was selected for experimenta-
tion,Inand
thisit study,
this consists
study, the
theofYOLOv8
three parts:
YOLOv8 the detection
object
object backbone, neck,
detectionalgorithm and was
algorithm head [16].
wasselected
selectedTheforbackbone
forexperimenta-
experimen-uses
the
tion, Darknet53
tation,andand architecture,
it consists
it consists of of
three
threewhich
parts: includes
the the
parts: backbone,basic
backbone, convolution
neck, andand
neck, units
headhead (Conv),
[16].[16]. spatial
The backbone
The backbone pyra-
uses
midDarknet53
uses
the pooling
the modules
Darknet53 (SPPFs)
architecture,
architecture, forwhich
which local and global
includes
includes feature
basic basic fusion,
convolution
convolution and(Conv),
units C2f modules
units (Conv), topyra-
en-
spatialspatial
hance
pyramid the network
pooling depth
modules and
(SPPFs)receptive
for localfields. The neck utilizes a PAN-FPN structure,
mid pooling modules (SPPFs) for local and andglobalglobal feature
feature fusion,
fusion, and andC2f C2f modules
modules to en-to
employing
enhance C2f modules to merge feature maps
fields.of different sizes. The head features a de-
hance thethe network
network depth
depth andandreceptive
receptive fields. TheThe neckutilizes
neck utilizes aa PAN-FPN
PAN-FPN structure,
structure,
coupled head
employing C2fstructure,
moduleswhich separates classification from detection and employs an an-a
C2f modules totomerge
merge feature
feature mapsofof
maps different
different sizes.
sizes. TheThe headhead features
features a de-
chor-free
decoupled mechanism
head structure,duringwhichdetection.
separates The loss function
classification fromcomputation
detection uses the task-
coupled head structure, which separates classification from detection andand employs
employs an
an an-
aligned
anchor-free assignment
mechanism strategy for positive
duringdetection. sample
detection.The Theloss allocation,
lossfunction combining
functioncomputation classification
computationuses usesthe loss
the task-
task-
chor-free mechanism during
(varifocal loss), regression loss (complete-IOU), and deep feature loss in a ternary
aligned assignment strategy for positive sample allocation, combining classification loss
weighted combination
(varifocal [17].
(varifocal loss),
loss),regression
regression lossloss
(complete-IOU),
(complete-IOU), and deep
and feature loss in aloss
deep feature ternary
in aweighted
ternary
This
combination study
[17]. proposes an improved CES-YOLOv8 network structure based on the
weighted combination [17].
YOLOv8 model structure. an
This study The improvements include thenetwork
incorporation of the ConvNeXt
This study proposes
proposes an improvedimproved CES-YOLOv8
CES-YOLOv8 network structure
structure based
based on the
on the
V2
YOLOv8module model to replace
structure. the C2f
The modules
improvements in the fifth
include and
the seventh layers
incorporation of
of the
the YOLOv8
ConvNeXt
YOLOv8 model structure. The improvements include the incorporation of the ConvNeXt
model’s backbone.
V2 Sparse C2f convolution was employed to process layers
partially occluded in-
V2 module
module to to replace
replace the the C2f modules
modules in in thethe fifth
fifth and
and seventh
seventh layers of of the
the YOLOv8
YOLOv8
puts, enhancing
model’s backbone. feature diversity
Sparse convolution while
convolution was improving
was employed computational
employed to to process efficiency
process partially and
partially occludedreducing
occluded in- in-
model’s
memory backbone. AnSparse
usage. feature ECA attention mechanism was introduced above the SPFF (Spatial
puts,
puts, enhancing
enhancing diversity while improving computational efficiency and reducing
Pyramid
memory Poolingfeature
usage. Module)
An
diversity
layer towhile
ECA attention
attention
improving
enhance
mechanism
computational
the learning of attention
was introduced
introduced
efficiency
relationships
above
and between
the SPFF
reducing
(Spatial
memory usage.
networkPooling
channels, An ECA
improving the mechanism
detection was
accuracy of adjacent above
mature the SPFF
fruits and(Spatial
oc-
Pyramid
Pyramid Pooling Module)
Module) layer
layer to
to enhance
enhance the
the learning
learning of
of attention
attention relationships
relationships between
between
cluded fruits.
network channels, Finally, angle
improving and
the distance
detection compensations
accuracy in the
of adjacent SIoU
mature (Smoothed
fruitsfruits Intersec-
and occluded
network
tion over channels,
Union) improving
loss function the
were detection accuracy
used to improve theof adjacent
IoU, enabling mature
theIntersection and oc-
rapid position-
fruits.
cluded Finally,
fruits. angle
Finally, and distance
angle and compensations
distance in
compensations the SIoU
in (Smoothed
the SIoU (Smoothed over
Intersec-
ing of the
Union) loss model’s
function prediction
were used boxes.
to The improved
improve the IoU, network
enabling structure
the rapid ispositioning
shown in Figure of the
tion
4. over
The Union)
specific loss function
structures and were usedof
algorithms toeach
improve
module the are
IoU,detailed
enabling in the following
the rapid position-sub-
model’s prediction boxes. The improved network structure is shown in Figure 4. The
ing of the model’s prediction boxes. The improved network structure is shown in Figure
sections.
specific structures and algorithms of each module are detailed in the following subsections.
4. The specific structures and algorithms of each module are detailed in the following sub-
sections.

Figure4.4.Improved
Figure Improvednetwork
network structure
structure diagram.
diagram.

2.3.1. ConvNeXt
Figure 4. ImprovedV2 Module
network structure diagram.
ConvNeXt V2, introduced by Sanghyun Woo and others [18], is a novel convolutional
neural network architecture that incorporates a fully convolutional masked autoencoder
(FCMEA) and a lightweight ConvNeXt decoder, as shown in Figure 5. The encoder uses
sparse convolutions to process only the visible parts of the input, reducing the pretraining
computational costs and allowing the model to use the remaining contextual information
to predict missing parts, thus enhancing its ability to learn and understand visual data.
Additionally, a global response normalization (GRN) layer is introduced in the convolu-
Agronomy 2024, 14, 1353
tional network to enhance feature competition between the channels. The GRN enhances
6 of 14
feature contrast, and through selective steps of global feature aggregation, normalization,
and calibration, it helps prevent feature collapse and thereby improves the model’s ex-
pressive and generalization capabilities [19]. This module enhances the performance of
sparse convolutions to process only the visible parts of the input, reducing the pretraining
pure convolutional neural networks in various downstream tasks, and the module struc-
computational costs and allowing the model to use the remaining contextual information
ture is shown in Figure 6.
to predict missing parts, thus enhancing its ability to learn and understand visual data.
In the detection of strawberry ripeness, ConvNeXt V2 randomly masks parts of the
Additionally, a global response normalization (GRN) layer is introduced in the convolu-
strawberry image. Through processing with sparse convolution, it predicts the masked
tional network to enhance feature competition between the channels. The GRN enhances
areas to capture details within the strawberry image, accurately capturing features while
feature contrast, and through selective steps of global feature aggregation, normalization,
reducing computational costs without sacrificing performance. Concurrently, the GRN
and calibration, it helps prevent feature collapse and thereby improves the model’s expres-
layer
siveenhances the competition
and generalization among[19].
capabilities feature
Thischannels, helping the
module enhances themodel better distin-
performance of pure
guish subtle differences
convolutional betweeninstrawberries
neural networks of different
various downstream maturities,
tasks, thus improving
and the module structure is
recognition accuracy.
shown in Figure 6.

Agronomy 2024, 14, x FOR PEER REVIEW 7 of 14

Figure
Figure 5. 5. FCMAE
FCMAE full
full convolutional
convolutional mask
mask autoencoder.
autoencoder.

Figure6.6.ConvNeXt
Figure ConvNeXtV2
V2module.
module.

2.3.2.
InECA Attentionof
the detection Mechanism
strawberry ripeness, ConvNeXt V2 randomly masks parts of the
strawberry image. Through processing with
Attention mechanisms dynamically sparse
adjust convolution,
the weights of theitinput
predicts the masked
features within a
areas to capture
network details within
[20], enabling the strawberry
better perception of theimage, accurately
distinctive capturing
features features
in images while
and facilitat-
reducing computational costs without sacrificing performance. Concurrently,
ing rapid target localization. This mechanism has been widely adopted in computer the GRN vi-
layer enhances the competition among feature channels, helping the model
sion. The efficient channel attention (ECA) module (Figure 7) avoids the dimension reduc- better dis-
tinguish
tion foundsubtle differences
in the between strawberries
squeeze-and-excitation of different
(SE) module. maturities,
It learns thus improving
channel attention directly
recognition accuracy.
after global average pooling using a one-dimensional convolution, maintaining the di-
mensionality of the channels [21]. A key feature of the ECA module is its adaptive method
for determining the size (k) of the one-dimensional convolutional kernel, which aligns the
local cross-channel interaction range with the channel dimensions, facilitating efficient
learning without manual adjustments. Due to its light weight and minimal additional pa-
rameters, the ECA module significantly reduces model complexity while maintaining per-
formance.
In this study, an ECA attention mechanism was added above the SPPF layer of the
backbone network. The ECA attention mechanism avoids dimensionality reduction, pre-
network [20], enabling better perception of the distinctive features in images and facilitat-
ing rapid target localization. This mechanism has been widely adopted in computer vi-
sion. The efficient channel attention (ECA) module (Figure 7) avoids the dimension reduc-
tion found in the squeeze-and-excitation (SE) module. It learns channel attention directly
Agronomy 2024, 14, 1353 after global average pooling using a one-dimensional convolution, maintaining the di-
7 of 14
mensionality of the channels [21]. A key feature of the ECA module is its adaptive method
for determining the size (k) of the one-dimensional convolutional kernel, which aligns the
local
2.3.2. cross-channel
ECA Attentioninteraction
Mechanism range with the channel dimensions, facilitating efficient
learning without manual adjustments. Due to its light weight and minimal additional pa-
Attention mechanisms dynamically adjust the weights of the input features within a
rameters, the ECA module significantly reduces model complexity while maintaining per-
network [20], enabling better perception of the distinctive features in images and facilitating
formance.
rapid target localization. This mechanism has been widely adopted in computer vision.
In this study, an ECA attention mechanism was added above the SPPF layer of the
The efficient channel attention (ECA) module (Figure 7) avoids the dimension reduction
backbone network. The ECA attention mechanism avoids dimensionality reduction, pre-
found in the squeeze-and-excitation (SE) module. It learns channel attention directly after
serving more original feature information of strawberries at different maturity levels,
global average pooling using a one-dimensional convolution, maintaining the dimension-
thereby enhancing feature-representation capabilities. Local interactions of one-dimen-
ality of the channels [21]. A key feature of the ECA module is its adaptive method for
sional convolution enable the model to focus more on key feature areas related to maturity
determining the size (k) of the one-dimensional convolutional kernel, which aligns the local
and automatically adjust the range of the receptive field based on different feature layers,
cross-channel interaction range with the channel dimensions, facilitating efficient learning
allowing the model to flexibly handle changes in the strawberry-ripening process.
without manual adjustments. Due to its light weight and minimal additional parameters,
the ECA module significantly reduces model complexity while maintaining performance.

Figure
Figure 7.
7. ECA
ECAattention
attentionmechanism
mechanism structure.
structure.

In this study, an ECA attention mechanism was added above the SPPF layer of the back-
bone network. The ECA attention mechanism avoids dimensionality reduction, preserving
more original feature information of strawberries at different maturity levels, thereby
enhancing feature-representation capabilities. Local interactions of one-dimensional con-
volution enable the model to focus more on key feature areas related to maturity and
automatically adjust the range of the receptive field based on different feature layers,
allowing the model to flexibly handle changes in the strawberry-ripening process.

2.3.3. SIoU Loss Function


The CIoU loss function primarily relies on the aggregation of bounding box regression,
but it overlooks the misalignment issue between the expected true boxes and predicted
boxes [22], a flaw that can slow convergence and reduce training efficiency.
To address the issue of IoU calculations when ground truth boxes and predicted boxes
overlap, this study employed the SIoU loss function proposed by Gevorgyan [23]. As
shown in Figure 8, in addition to considering the distance σ, overlap area, and aspect ratio
between the predicted boxes, the SIoU loss function also takes into account the vector
angles α and β between the true box, BGT , and the predicted box, Box B. It incorporates
an angular penalty term, redefining the related loss function [24]. In the detection of
strawberry ripeness, the angle cost introduced via SIoU can reduce the distance between
prediction and ground truth boxes by optimizing angles, thereby indirectly improving IoU.
Additionally, the distance cost, aside from considering straight-line distances, also adjusts
angles, increasing the intersection between prediction and ground truth boxes over a wider
range. Finally, SIoU dynamically adjusts the weights of distance and angle to accommodate
different overlaps of strawberries, preventing gradient-vanishing issues.
berry ripeness, the angle cost introduced via SIoU can reduce the distance between pre-
diction and ground truth boxes by optimizing angles, thereby indirectly improving IoU.
Additionally, the distance cost, aside from considering straight-line distances, also adjusts
angles, increasing the intersection between prediction and ground truth boxes over a
Agronomy 2024, 14, 1353 8 of 14
wider range. Finally, SIoU dynamically adjusts the weights of distance and angle to ac-
commodate different overlaps of strawberries, preventing gradient-vanishing issues.

Figure8.8.Schematic
Figure Schematicdiagram
diagramof
ofSIoU
SIoUloss
lossfunction.
function.

2.4.
2.4.Model
ModelEvaluation
EvaluationMetrics
Metrics
The
Theprocess
processused
usedto
todetermine
determinethetheripeness
ripenessof
ofstrawberry
strawberryfruits
fruitsmust
mustconsider
considerboth
both
detection accuracy and performance. For model detection accuracy, precision, recall,
detection accuracy and performance. For model detection accuracy, precision, recall, and and
the
theF1
F1score
scoreare
areused
usedasasevaluation
evaluationmetrics.
metrics.For
Formodel
modelperformance,
performance,thethemAP50,
mAP50,which
whichisis
the
the mean average precision at a threshold of 50%, was selected as the evaluation metric.
mean average precision at a threshold of 50%, was selected as the evaluation metric.
The
Theformula
formulaisisas
asfollows:
follows:
TPTP
Precision
Precision == ××100%
100% (1)
(1)
TPTP
++ FPFP
TP
Recall = = TP ××100%
Recall 100% (2)
(2)
TPTP++ FNFN
2 ∗ Precision ∗ Recall
F1 = 2 ∗ Precision ∗ Recall × 100% (3)
F1 =(Precision + Recall) × 100% (3)
(Precision + Recall)
k
∑ P×R
AP = 1 ×1 (4)
∑ KP × R
AP = k ×1 (4)
∑1 APK
mAP = ×1 (5)
k
∑ AP
TP (true positives) represents the number ×1
mAPof=actual positive (5)
samples predicted as positive.
k
FP (false positives) represents the number of actual negative samples predicted as positive.
TP (false
FN (true positives)
negatives) represents
represents the
the number
number ofof actual
actual positive
positive samples
samplespredicted
predictedas
asnegative.
positive.
FP (false positives) represents the number of actual negative samples predicted as posi-
TN (true negatives) represents the number of actual negative samples predicted as negative.
tive. FN (false negatives) represents the number of actual positive samples predicted as
3. Experiments and Result Analysis
3.1. Experimental Environment Configuration and Network Parameters
The training and testing of this study’s model were performed on a computer equipped
with an Intel Core i7-13700K CPU at 3.4 GHz, 32 GB of RAM, and a Windows 10 (64-bit)
operating system accelerated by a GeForce RTX 4070 Ti GPU with 12 GB of VRAM. The
programming language used was Python 3.8.10, the deep learning framework was PyTorch
1.2.0, and the OpenCV version was 4.8. The initial learning rate was set to 0.001 to balance
the model’s convergence speed and learning efficiency, preventing instability due to rapid
convergence. Additionally, a momentum decay strategy was employed, with a value set
to 0.937, to speed up the learning process and avoid local minima. Finally, to enhance the
model’s generalization capability, a weight decay of 0.0005 was set, which helped reduce
the risk of overfitting.

3.2. CES-YOLOv8 Model Experiments


To validate the performance of the CES-YOLOv8 model, 545 strawberry images from
the test set were evaluated. Table 2 presents the detection results of the algorithm at
3.2. CES-YOLOv8 Model Experiments
To validate the performance of the CES-YOLOv8 model, 545 strawberry images from
the test set were evaluated. Table 2 presents the detection results of the algorithm at dif-
ferent maturity levels. According to Table 2, the algorithm achieved a precision of 88.2%,
Agronomy 2024, 14, 1353 a recall of 89.8%, an mAP50 of 92.10%, and an F1 score of 88.99%. 9 of 14

Table 2. Detection results for different maturity levels.


different maturity
Maturity Level levels. According to Table
Precision Recall2, the algorithm
mAP50 achieved a precision
F1 scoreof 88.2%,
a recall of 89.8%, an mAP50 of 92.10%, and an F1 score of 88.99%.
Immature_stage 80.80% 75.80% 82.70% 78.22%
Table 2. Ma-
Detection results88.10%
for different maturity
91.80%levels. 93% 89.91%
ture_white_stage
Maturity Level
Color_turn- Precision Recall mAP50 F1 Score
90.10% 94.30% 94.60% 92.15%
ing_stage
Immature_stage 80.80% 75.80% 82.70% 78.22%
Ripe_stage
Mature_white_stage 93.60%
88.10% 97.10%91.80% 98.10% 93% 95.32%
89.91%
Color_turning_stage
Average 90.10%
88.20% 89.80%94.30% 92.10%94.60% 92.15%
88.99%
Ripe_stage 93.60% 97.10% 98.10% 95.32%
Average 88.20% 89.80% 92.10% 88.99%
The improved algorithm further integrates the positional and semantic information
of occluded fruits, enabling
The improved algorithmthe extraction
further of thethe
integrates fine-grained
positional features of theinformation
and semantic fruit pheno-of
types for the accurate detection of fruits at different maturity levels. Figure 9 clearly
occluded fruits, enabling the extraction of the fine-grained features of the fruit phenotypes shows
that the algorithm can accurately detect strawberries of different maturity levels
for the accurate detection of fruits at different maturity levels. Figure 9 clearly shows that in images
with
the single and
algorithm canmultiple
accurately targets
detectand overlapping
strawberries occlusions.
of different Figurelevels
maturity 9d shows that the
in images with
algorithm can accurately identify small targets and severely occluded strawberries
single and multiple targets and overlapping occlusions. Figure 9d shows that the algorithm that
haveaccurately
can fallen onto planting
identify racks.
small In summary,
targets the improved
and severely occludedCES-YOLOv8
strawberries thatmodel canfallen
have ac-
curately detect the maturity of fruits, exhibiting good detection performance
onto planting racks. In summary, the improved CES-YOLOv8 model can accurately detect for small tar-
gets,maturity
the multipleoftargets,
fruits, foliage occlusion,
exhibiting heavy fruiting,
good detection and varying
performance lighting
for small conditions.
targets, multiple
targets, foliage occlusion, heavy fruiting, and varying lighting conditions.

(a) (b) (c) (d)


Figure 9. Model detection example image. (a) Single-object detection, (b) discrete multi-object
detection, (c) minor obstruction detection, and (d) severe obstruction detection.

3.3. Ablation Study of the Improved CES-YOLOv8


To further validate the impact of the improved CES-YOLOv8 on model performance in
strawberry ripeness detection, the modified algorithm was systematically compared with
the initial algorithm in order to assess the impact of each improvement. The specific experi-
mental results are shown in Table 3, where “-” indicates no change to the original structure.

Table 3. Ablation experiment results.

Backbone Attention Loss


Model Precision Recall mAP50 F1 Score FPS
Network Mechanism Function
YOLOv8 - - - 83.40% 86.90% 90.10% 85.11% 220.92
YOLOv8 - - SIOU 85.90% 86.60% 90.30% 86.25% 224.74
YOLOv8 - ECA - 83.50% 87.50% 90.40% 85.45% 225.30
YOLOv8 ConvNeXt V2 - - 86.60% 87.80% 90.50% 87.20% 192.82
YOLOv8 - ECA SIOU 86.30% 87.60% 90.70% 86.95% 230.39
YOLOv8 ConvNeXt V2 - SIOU 87.70% 89.70% 91.70% 88.69% 191.16
YOLOv8 ConvNeXt V2 ECA - 86.80% 86.70% 91.10% 86.75% 186.06
YOLOv8 ConvNeXt V2 ECA SIOU 88.20% 89.80% 92.10% 88.99% 184.52

After the SIoU loss function was adopted, there were slight increases in the mAP50
and F1 values, with the greatest improvement in precision being 2.5%. This indicates that
the SIoU loss function, through more refined bounding box regression optimization, helped
the model achieve better performance. The application of the ECA attention mechanism
Agronomy 2024, 14, 1353 10 of 14

also improved the model’s precision, recall, and mAP50, demonstrating that the enhance-
ment of the attention mechanism strengthened the feature expression of specific channels,
significantly improving the accurate positioning and recognition of targets. Replacing
the backbone network with ConvNeXt V2 increased the precision, recall, mAP50, and F1
values by 3.2%, 0.9%, 0.4%, and 2.09%, respectively. This significant improvement indicates
that ConvNeXt V2, compared to the original YOLOv8 structure, better captures image
features and enhances model performance. Combining ConvNeXt V2, ECA, and SIoU
allowed the model’s performance to reach the highest values among all the metrics, with
precision, recall, mAP50, and F1 values of 88.20%, 89.80%, 92.10%, and 88.99%, respectively,
indicating increases of 4.8%, 2.9%, 2.05%, and 3.88%.
This study also calculated the frames per second (FPS) with the test dataset. The results
show that the original YOLOv8 had an FPS of 220.92. The introduction of the ECA attention
mechanism and SIoU loss function improved the processing speed, but the inclusion of
ConvNeXt V2 in the backbone network increased the image features and computational
time, leading to a noticeable decrease in FPS. Despite this, the FPS of the improved model
still reached 184.52, far exceeding the requirements for real-time processing applications
(over 30 FPS).
In summary, when all improvements are combined, their effects are further amplified,
significantly enhancing the overall performance of strawberry ripeness detection without
sacrificing real-time processing capabilities.

3.4. Comparative Analysis of Different Target Detection Networks


To qualitatively evaluate the detection results of the improved CES-YOLOv8 model,
this model was compared with Faker R-CNN, RetinaNet, YOLOv5, YOLOv7, and the
original YOLOv8 models in detecting strawberry images from a test set. The results,
as shown in Table 4, indicate that the improved CES-YOLOv8 model excels in terms of
precision, recall, and F1 score by 88.20%, 89.80%, and 88.99%, respectively, demonstrating
that CES-YOLOv8 has high accuracy in detecting strawberry ripening.

Table 4. Comparative experiment results of different models.

Model Size
Model Precision Recall mAP50
/M
YOLOv5 86.50% 89.40% 80.26% 3.74
YOLOv7 77.10% 83.70% 80.26% 71.3
Retinanet 64.42% 92.59% 80.26% 139
Faster RCNN 65.26% 89.22% 80.26% 108
YOLOv8 85.90% 86.60% 86.25% 5.91
CES-YOLOv8 88.20% 89.80% 88.99% 41.3

Some of the inference results from the different models are shown in Figure 10, where
the red arrows indicate missed detections, the blue arrows indicate false detections, and
the yellow arrows indicate duplicate detections. The results show that the CES-YOLOv8
model surpasses other models in accurately identifying occluded targets and reducing false
positives and false negatives. YOLOv8 and YOLOv5 have significant issues with missed
detections in occluded strawberries. Although YOLOv7 has a lower miss rate, it suffers
from multiple duplicate detections, which could affect accuracy and increase the detection
time. Faster R-CNN and RetinaNet perform poorly in terms of accuracy and false detections,
especially RetinaNet, which has numerous duplicate detection issues, making it impractical
for real-world applications. Overall, CES-YOLOv8 is advantageous for reducing common
issues in agricultural production applications, such as fruit and leaf occlusions, and it has
significantly improved accuracy in fruit identification and positioning.
Retinanet 64.42% 92.59% 80.26% 139
Faster
65.26% 89.22% 80.26% 108
RCNN
YOLOv8 85.90% 86.60% 86.25% 5.91
Agronomy 2024, 14, 1353 CES- 11 of 14
88.20% 89.80% 88.99% 41.3
YOLOv8

Original Image CES-YOLOV8 LOYOv8 LOYOv5 LOYOv7 Faster RCNN Retinanet


Figure 10. Strawberry maturity detection images using different models.
Figure 10. Strawberry maturity detection images using different models.

4.4.Discussion
Discussion
As the global population increases and land resources become increasingly scarce, im-
proving agricultural production efficiency has become particularly important. Automated
harvesting robot technology, as a key advancement in smart agriculture, holds significant
value in increasing harvesting efficiency and reducing labor costs [25]. Traditional methods
for determining fruit ripeness are limited, primarily relying on a simple recognition of color
and size, making it difficult to adapt to variable growing conditions and the significant
color differences in fruits during the ripening process. Existing studies have proposed mul-
tiple solutions, but there are significant environmental dependency issues and difficulties
in balancing model accuracy and efficiency [26]. Therefore, developing an efficient and
accurate algorithm for detecting strawberry ripeness is crucial.
In response to these shortcomings, in this study, a CES-YOLOv8 network model based
on improvements to YOLOv8 was proposed; it enhances the accuracy and robustness
of strawberry ripeness recognition while balancing the model’s use of computational
resources. During the data collection phase, the impacts of different lighting conditions,
varying levels of occlusion, and different angles on image acquisition were comprehensively
considered, greatly enhancing the model’s adaptability and robustness in real agricultural
production environments. Additionally, by replacing some C2f modules of the backbone
layer with ConvNeXt V2 modules and introducing ECA attention in the layer above
the SPFF, the model’s feature diversity and generalization capability were effectively
enhanced, improving performance while reducing memory resource usage and increasing
the accuracy of fruit detection in complex environments. The experimental results show
that the improved network achieved significant performance enhancements in strawberry
ripeness recognition tasks, with an accuracy of 88.2%, a recall of 89.8%, an mAP50 of 92.10%,
and an F1 score of 88.99%, representing improvements of 4.8%, 2.9%, 2.05%, and 3.88%,
respectively, over the corresponding values of the original YOLOv8 network.
The improvements in precision and recall mean that the model is more accurate in
detecting the ripeness of strawberries, reducing misidentification. In practical applications,
this can prevent fruits from being harvested at the wrong time, ensuring product quality
and market value. A higher mAP50 indicates that the model maintains high performance
in real-time dynamic environments. A higher F1 score ensures that all ripe strawberries are
correctly classified, which is important for actual production. As the model is improved,
the increase in feature computation also leads to a reduction in FPS, but it still far surpasses
Agronomy 2024, 14, 1353 12 of 14

the need for real-time processing. An FPS of 184.52 supports the rapid scanning of fruits
using robots without slowing down production due to the image processing speed. Overall,
the improved model enhances various performance metrics without sacrificing real-time
processing capabilities. These results not only validate the effectiveness of the improve-
ments but also demonstrate that the methods proposed in this study can accurately identify
strawberry ripeness in complex environments, significantly advancing the development of
automated harvesting technologies.
However, this study has certain limitations. First, although the model in this study
performs well in detecting the ripeness of strawberries, its generality and applicability to
other types of fruits or crops need further verification. Second, considering the complexity
of agricultural production, such as the impact of climate conditions and soil types in
different regions on fruit ripeness, it is necessary to explore the adaptability and robustness
of the model under more diverse conditions [27]. Additionally, although the experiment
considered various issues such as different lighting conditions, degrees of occlusion, and
angles, it overlooked more physiological details; future work can further investigate the
specific mechanisms through which these factors affect model performance and how to
further optimize the model in order to address these challenges.
To address the aforementioned shortcomings, future research will further explore
the generalization capabilities of the model, especially for different types of fruits, and
for ripeness detection at various growth stages. Moreover, given the complexities of
actual agricultural production, future research should focus more on the adaptability and
robustness of the model under real field conditions, including its response to different
climatic conditions and pest impacts. Through continuous optimization and improvement,
more technical support for the development of smart agriculture and automated harvesting
technologies can be provided, contributing to the enhancement of agricultural production
efficiency and sustainable development.

5. Conclusions
Addressing the current difficulty of balancing model accuracy and performance in
ripeness detection using automated harvesting robots, this study focused on strawberries
and proposed an improved CES-YOLOv8 network structure. During the data collection
phase, the effects of different lighting conditions, degrees of occlusion, and angles were
considered, and image data covering these scenarios were collected, effectively enhanc-
ing the model’s applicability and robustness in real agricultural environments. Targeted
improvements were made to the YOLOv8 object detection network, including the replace-
ment of some C2f modules in the backbone layer with ConvNeXt V2 modules and the
introduction of ECA attention in the layer above the SPFF. The improvements enhanced
the model’s feature diversity and generalization ability, boosting its performance. The
model’s accuracy, recall, mAP50, and F1 score reached 88.20%, 89.80%, 92.10%, and 88.99%,
respectively, showing increases of 4.8%, 2.9%, 2.05%, and 3.88%, respectively, compared to
the corresponding values of the initial YOLOv8 structure. While improving the accuracy
and precision of strawberry ripeness detection, the enhancements also effectively reduced
the problems of missed and duplicate detections. This study provides an efficient and
precise ripeness detection technology for automated harvesting robots in the field of smart
agriculture, which advances the field of smart agriculture, enhances agricultural production
efficiency, and supports sustainable agricultural development.

Author Contributions: Conceptualization, methodology, investigation, formal analysis, data curation,


validation, writing—original draft, and writing—review and editing Y.C.; conceptualization, method-
ology, software, investigation, formal analysis, data curation, validation, writing—original draft, and
validation, H.X.; methodology, writing—original draft, visualization, investigation, validation, and
writing—review and editing, P.C.; methodology, investigation, writing—review and editing, and
validation, Y.H. and F.Z.; investigation, formal analysis, writing—review and editing, and validation,
L.C. and Q.J.; conceptualization, resources, supervision, and writing—review and editing, H.Z. and
S.L. All authors have read and agreed to the published version of the manuscript.
Agronomy 2024, 14, 1353 13 of 14

Funding: This research was funded through the following grant: Key Technology for Digitization of
Characteristic Agricultural Industries in Fujian Province (XTCXGC2021015).
Data Availability Statement: Since the project presented in this research has not yet concluded, the
experimental data will not be disclosed for the time being. Should readers require any supporting
information, they may contact the corresponding author via email.
Conflicts of Interest: The authors declare no conflicts of interest. The funders had no role in the design
of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or
in the decision to publish the results.

References
1. Rehman, A.; Saba, T.; Kashif, M.; Fati, S.M.; Bahaj, S.A.; Chaudhry, H. A revisit of internet of things technologies for monitoring
and control strategies in smart agriculture. Agronomy 2022, 12, 127. [CrossRef]
2. Yamamoto, S.; Hayashi, S.; Yoshida, H.; Kobayashi, K. Development of a stationary robotic strawberry harvester with a picking
mechanism that approaches the target fruit from below. Jpn. Agric. Res. Q. 2014, 48, 261–269. [CrossRef]
3. Hayashi, S.; Yamamoto, S.; Saito, S.; Ochiai, Y.; Kamata, J.; Kurita, M.; Yamamoto, K. Field operation of a movable strawberry-
harvesting robot using a travel platform. Jpn. Agric. Res. Q. 2014, 48, 307–316. [CrossRef]
4. Kaur, H.; Sawhney, B.K.; Jawandha, S.K. Evaluation of plum fruit maturity by image processing techniques. J. Food Sci. Technol.
2018, 55, 3008–3015. [CrossRef] [PubMed]
5. Villaseñor-Aguilar, M.J.; Botello-Álvarez, J.E.; Pérez-Pinal, F.J.; Cano-Lara, M.; León-Galván, M.F.; Bravo-Sánchez, M.-G.; Barranco-
Gutierrez, A.I. Fuzzy classification of the maturity of the tomato using a vision system. J. Sens. 2019, 2019, 3175848. [CrossRef]
6. Yin, Y.; Guo, C.; Shi, H.; Zhao, J.; Ma, F.; An, W.; He, X.; Luo, Q.; Cao, Y.; Zhan, X. Genome-wide comparative analysis of the
R2R3-MYB gene family in five solanaceae species and identification of members regulating carotenoid biosynthesis in wolfberry.
Int. J. Mol. Sci. 2022, 23, 2259. [CrossRef] [PubMed]
7. Parvathi, S.; Selvi, S.T. Detection of maturity stages of coconuts in complex background using Faster R-CNN model. Biosyst. Eng.
2021, 202, 119–132. [CrossRef]
8. Wang, L.M.; Jiang, Y. Automatic grading of banana ripeness based on deep learning. Food Mach. 2022, 38, 149–154. [CrossRef]
9. Wang, Z.; Ling, Y.; Wang, X.; Meng, D.; Nie, L.; An, G.; Wang, X. An improved Faster R-CNN model for multi-object tomato
maturity detection in complex scenarios. Ecol. Inform. 2022, 72, 101886. [CrossRef]
10. Chen, F.; Zhang, X.; Zhu, X.; Li, Z.; Lin, J. Detection of olive fruit maturity based on improved EfficientDet. Trans. Chin. Soc. Agric.
Eng. 2022, 38, 158–166.
11. Wang, C.; Wang, C.; Wang, L.; Wang, J.; Liao, J.; Li, Y.; Lan, Y. A lightweight cherry tomato maturity real-time detection algorithm
based on improved YOLOV5n. Agronomy 2023, 13, 2106. [CrossRef]
12. Kazama, E.H.; Tedesco, D.; Carreira, V.d.S.; Júnior, M.B.; de Oliveira, M.F.; Ferreira, F.M.; Junior, W.M.; da Silva, R.P. Monitoring
coffee fruit maturity using an enhanced convolutional neural network under different image acquisition settings. Sci. Hortic.
2024, 328, 112957. [CrossRef]
13. Megalingam, R.K.; Manoharan, S.K.; Maruthababu, R.B. Integrated fuzzy and deep learning model for identification of coconut
maturity without human intervention. Neural Comput. Appl. 2024, 1–13. [CrossRef]
14. Zhang, W.; Liu, Y.; Chen, K.; Li, H.; Duan, Y.; Wu, W.; Shi, Y.; Guo, W. Lightweight fruit-detection algorithm for edge computing
applications. Front. Plant Sci. 2021, 12, 740936. [CrossRef] [PubMed]
15. Xiao, Z.Q.; He, J.X.; Chen, D.B.; Zhan, Y.; Lu, Y.L. Automatic classification method of rock spectra based on twin network model.
Spectrosc. Spectr. Anal. 2024, 44, 558–562.
16. Wang, Y.T.; Zhou, H.Q.; Yan, J.X.; He, C.; Huang, L.L. Progress in computational optics research based on deep learning algorithms.
Chin. J. Lasers 2021, 48, 1918004.
17. Zhao, J.D.; Zhen, G.Y.; Chu, C.Q. Drone image target detection algorithm based on YOLOv8. Comput. Eng. 2024, 50, 113–120.
18. Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked
autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada,
17–24 June 2023; pp. 16133–16142.
19. Li, Y.; He, Z.; Ma, J.; Zhang, Z.; Zhang, W.; Chatterjee, P.; Pamucar, D. A Novel Feature Aggregation Approach for Image Retrieval
Using Local and Global Features. CMES-Comput. Model. Eng. Sci. 2022, 131, 239–262. [CrossRef]
20. Zhu, M.L.; Ren, Y.Z. Screw surface defect detection based on neural networks. J. Ordnance Equip. Eng. 2024, 45, 224–231.
21. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 11534–11542.
22. Li, G.M.; Gong, H.B.; Yuan, K. Research on Sichuan pepper cluster detection based on lightweight YOLOv5s. Chin. J. Agric. Mech.
2023, 44, 153.
23. Gevorgyan, Z. SIoU Loss: More Powerful Learning for Bounding Box Regression, 23 May 2022. Available online: https:
//arxiv.org/abs/2205.12740 (accessed on 16 April 2024).
Agronomy 2024, 14, 1353 14 of 14

24. Gu, Z.; Zhu, K.; You, S. YOLO-SSFS: A Method Combining SPD-Conv/STDL/IM-FPN/SIoU for Outdoor Small Target Vehicle
Detection. Electronics 2023, 12, 3744. [CrossRef]
25. Raja, V.; Bhaskaran, B.; Nagaraj, K.; Sampathkumar, J.; Senthilkumar, S. Agricultural harvesting using integrated robot system.
Indones. J. Electr. Eng. Comput. Sci. 2022, 25, 152. [CrossRef]
26. Yoshida, T.; Onishi, Y.; Kawahara, T.; Fukao, T. Automated harvesting by a dual-arm fruit harvesting robot. Robomech J. 2022,
9, 19. [CrossRef]
27. Vincent, D.R.; Deepa, N.; Elavarasan, D.; Srinivasan, K.; Chauhdary, S.H.; Iwendi, C. Sensors driven AI-based agriculture
recommendation model for assessing land suitability. Sensors 2019, 19, 3667. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like