YOLO Deep Learning Algorithm
YOLO Deep Learning Algorithm
1Department of Remote Sensing and Geographic Information System, Tamil Nadu Agricultural University, Coimbatore
2Centre for Water and Geospatial Studies, Tamil Nadu Agricultural University, Coimbatore, India
ly
approach of using regression problems for frame detection, elimi- cannot meet the need because of several factors like pest and dis-
nating the need for a complex pipeline. In agriculture, using ease attacks, improper harvesting, climate factors, biodiversity,
on
remote sensing and drone technologies YOLO classifies and etc. Using advanced technologies like drones, Artificial
detects crops, diseases, and pests, and is also used for land use Intelligence (AI), and robots we can manage those factors. Thus,
mapping, environmental monitoring, urban planning, and wildlife. by introducing AI techniques we can improve the agriculture sec-
e
Recent research highlights YOLO’s impressive performance in tor production and reduce crop loss by reducing pest and disease
various agricultural applications. For instance, YOLOv4 demon-
strated high accuracy in counting and locating small objects in
us
attacks, improving nutrient management, timely harvest, etc.
Using deep learning, big data, and Internet of Things (IoT) we can
UAV-captured images of bean plants, achieving an AP of 84.8% monitor crops, predict yield, manage irrigation, manage weeds,
al
and a recall of 89%. Similarly, YOLOv5 showed significant pre- detect plant stress, etc. In computer vision, the most challenging
cision in identifying rice leaf diseases, with a precision rate of and fundamental task is object detection. Object detection
ci
involves accurately finding the object in the input image and clas-
sifying it according to the labels. In object detection, object clas-
er
Agricultural University, Coimbatore, India. improved from machine learning to deep learning methods which
E-mail: kumaraperumal.r@tnau.ac.in are based on analytics. In remote sensing object detection is more
challenging due to the smaller number of datasets available, and
Key words: agriculture; computer vision; deep learning; object
low-resolution images (Teng et al., 2019; Yin et al., 2018, 2019).
-c
uscript and agreed to be accountable for all aspects of the work. ing stage: Two-stage and single-stage object detection. Two-stage
object detection is represented by R-CNN (Region-based
Convolutional Neural Network) (Girshick, 2015). It involves
N
YOLO algorithm of their networks. We can improve the Deep con- because it requires only a single network evaluation. Non-maxi-
volution network by implementing large-scale image datasets such mum suppression (NMS) has been used here to reduce the multiple
as ImageNet and COCO. In agriculture YOLO classifies and detection error. Source code: https://github.com/pjreddie/darknet
detects crops (Espinoza-Hernández et al., 2023; Tian et al., 2019;
Wu et al., 2020),weeds (Ajayi et al., 2023), diseases and pests YOLO9000
(Lippi et al., 2021), land use mapping (Cheng et al., 2021), envi- YOLO9000 can detect over nine thousand object categories. A
ronmental monitoring (Zakria et al., 2022), urban planning (Qing 2% improvement in mAP is achieved by adding batch normaliza-
et al., 2021), and wildlife (Roy et al., 2023). tion to all Convolutional layers in YOLO, by this we can remove
dropouts from the model without overfitting. YOLO9000 has been
adjusted to work better in higher resolution inputs, by this increase
of 4% mAP is achieved. For predicting bounding boxes instead of
YOLO (you only look once) using fully connected layers, YOLOv2 used anchor boxes. The
YOLO is a real-time object identification technique that was network operation has been reduced to 416 images instead of 448
introduced in 2015 by Redmon and colleagues in their research x 448 to achieve an odd number of locations in the feature map so
paper “You only look once: unified, real-time object detection” that we have a single centre cell. In YOLOv2 they used multiscale
(Redmon et al., 2016) (Figure 1). YOLO approached the object training. For every 10 batches of iterations, different resolutions
detection problem as a regression problem spatially. The YOLO have been chosen by the algorithm itself. By this, the algorithm can
method is a direct object detection technique that employs a soli- detect objects at different resolutions. It is seen that, for lower res-
tary neural network to forecast several bounding boxes and the cor- olution, the algorithm works fairly accurately by producing 69
responding probability of each box’s class. YOLO directly trains mAP, and for higher resolution still operates above real-time speed
and enhances detection performance on full-size photos, so implic- producing 78.6 mAP (Redmon and Farhadi, 2017). Using PAS-
ly
itly adding contextual knowledge about classes and their visual CAL VOC 2012 dataset YOLOv2 runs faster than other algorithm
on
properties. Notably, Fast R-CNN, a leading detection method, methods achieving 73.4 mAP (Table 1). YOLOv2, we have been
tends to misinterpret background patches as objects due to its lim- cooperatively using the detection dataset (detects bounding boxes,
ited contextual awareness. and objectness and classifies common objects) and classification
e
dataset (expands the number of categories the algorithm can
us
detect). Here we use the multi-label model to combine data that are
not mutually exclusive. If an image is labelled for detection, our
History of YOLO network can backpropagate based on the complete YOLOv2 loss
al
function. When YOLOv2 encounters an image that requires classi-
YOLOv1 - you only look once version 1 fication, it only backpropagates the loss from the parts of the archi-
ci
object detection on videos as well (Jiang et al., 2022). YOLO vati- Despite bigger architecture, they maintained the real-time perfor-
cinate class probabilities and bounding boxes by a single neural mance of YOLOv3.YOLOv3 architecture is made up of 53 convo-
network in a single evaluation. By this, the optimization of the lutional networks. It predicts the object using the multiscale perdi-
-c
algorithm on object detection can increased directly (Redmon et tion method where it uses bounding boxes of different grid sizes
al., 2016). By this real-time object detection has been achieved and which improves the prediction of smaller objects. Using regression
on
we can detect objects even in videos having more fps. Fast RCNN YOLOv3 predicts the objectness score for each bounding box.
an object detection algorithm, makes errors by identifying back- Anchor boxes having the highest overlap with ground truth objects
N
ground patches as objects in an image but YOLO makes less than have given 1 as the objectness score whereas other boxes have
half an error when compared to the RCNN algorithm. given 0 as the objectness score. In YOLOv3, the author added
Because of YOLO’s generalizability, it is more stable when Spatial Pyramid Pooling as the backbone of the architecture which
applied to a new domain of interest or unexpected inputs, when improves AP50 by 2.7%. 36.2% average precision AP is achieved
YOLO was introduced with artworks trained with natural images, it by YOLOv3-spp in the COCO MS dataset, and 60.6% AP50 at 20
outperformed other algorithms such as RCNN and DPM but the FPS is achieved by YOLOv3 2 times faster (Redmon and Farhadi,
accuracy of YOLOv1 less. During test time YOLO is extremely fast 2018). Source code: https://pjreddie.com/darknet/yolo/
Table 1. YOLOv2 performance using PASCAL VOC 2007 dataset and PASCAL VOC 2012 dataset.
Detection frame Resolution FPS mAP (%)
VOC 2007 dataset
YOLOv2 (Redmon and Farhadi, 2017) 288 x 288 91 69
352 x 352 81 73.7
416 x 416 67 76.8
480 x 480 59 77.8
544 x 544 40 78.6
VOC 2012 dataset
YOLOv2 544 x 544 73.4
SSD (Liu et al., 2016) 512 x 512 74.9
SSD 300 x 300 72.4
YOLOv1 57.9
Fast R-CNN (Girshick, 2015) 68.4
Faster R-CNN (Zhang et al., 2016) 70.4.
ly
on
Table 2. Some studies related to object detection using YOLO in agriculture.
e
Author YOLO model Number of images Accuracy Resolution (px) Inference
used used for training (for training)
us
Buzzy et al., 2020 Tiny-YOLOv3 >1000 Inference time 0.01 s 410 x 410 Counting of plant leaves using Tiny
F1 score 0.94 YOLOv3 model
al
FPR 24%
ci
Hamidisepehr et al., 2020 YOLOv2 478 AP 97% to 55.99% 570 x 430 Compared different object detection
algorithms for corn damage assessment
er
Bazame et al., 2021 Tiny-YOLOv3 mAP 84%, 800 x 800 Mapping, classification, and detection of
F1 score 82% coffee fruits from videos using computer
m
Ohnemüller and Briassouli, 2021 Scaled YOLOv4 3782 10% higher mAP score than 480 x 480 Improvement of YOLOv4 MS COCO dataset
the baseline model accuracy and efficiency for detection of plants using
Nugroho et al., 2022 YOLOv4 400 Average accuracy 94.6% 1024 x 720 Detection of tomato ripeness using different
-c
Wiggers et al., 2022 YOLOv3 and YOLOv4 68 AP 84.8% (YOLOv4) 416 x 416 Bean plants were captured using UAV and counted
Recall 89% (YOLOv4) using YOLOv3 and YOLOv4 models. From this
N
YOLOv4 YOLOv7
In April 2020, YOLOv4 was introduced by Bochkovskiy and Wang and colleagues published YOLOv7 in ArXiv in July
colleagues in ArXiv. YOLOv4 aimed to discover the ideal equilib- 2022. YOLOv7 outperformed all existing object detectors in both
rium by exploring numerous modifications classified as “bag-of- accuracy as well as speed from 5 FPS to 160 FPS range. Like
freebies” and “bag-of-specials.” “Bag-of-freebies” encompasses YOLOv4, YOLOv7 underwent training solely on the MS COCO
techniques altering the training strategy, and escalating training dataset without leveraging pre-trained backbones. YOLOv7 intro-
expenses, yet without a rise in inference time, with data augmenta- duced several architectural modifications and a set of “bag-of-free-
tion being the predominant example. Conversely, “bag-of-spe- bies”, contributing to enhanced accuracy without compromising
cials” includes methods that slightly amplify inference costs but inference speed, with the only impact being on the training time.
markedly enhance accuracy. In YOLOv4 Self-Adversarial ELAN is a strategy developed to improve the learning and con-
Training (SAT) is used where it hides the ground truth object and vergence efficiency of a deep model by controlling the shortest
detects the correct object based on original labels. AP of 43.5% is longest gradient path. YOLOv7 introduced E-ELAN, a feature
achieved in MS COCO dataset test-dev 2017 and 65.7% AP50 at designed exclusively for models that include an endless number of
more than 50 FPS is achieved using NVIDIA V100 (Bochkovskiy stacked computational blocks. E-ELAN increases network learn-
et al., 2020). Source code: https://github.com/AlexeyAB/darknet ing by shuffling and merging cardinality among distinct groups,
hence boosting the learning process without affecting the integrity
YOLOv5 of the original gradient path. It attains the maximum accuracy,
A few months after the release of YOLOv4, YOLOv5 is exhibiting an astonishing 56.8% average precision (AP), outper-
released by Glen Jocher. YOLOv5 was developed in PyTroch. forming all other real-time object detectors specifically intended
They used the Auto Anchor method which adjusts and checks for GPUs, such as the V100, when working at 30 FPS or above (C.-
anchor boxes for unfitness for training settings and dataset. Y. Wang et al., 2023). Source code: https://github.com/
ly
Installation of YOLOv5 in IoT devices is easier because it is writ- WongKinYiu/yolov7
on
ten in Python programming language. Even though no articles
were published by the author for YOLOv5 it is said that YOLOv5 YOLOv8
outperforms the other previous versions. Different model versions YOLOv8 uses two loss functions to increase its performance.
e
of YOLOv5 have been released such as YOLOv5n (nano), The CIoU and DFL loss functions are utilized for bounding box
YOLOv5s (small), YOLOv5m (medium), YOLOv5l (large), and loss, whereas binary cross-entropy is employed for classification
us
YOLOv5x (extra-large) where their convolutional size is changed loss. These loss functions have been demonstrated to increase
based on different hardware requirements and applications. object detection performance, especially when dealing with tiny
al
YOLOv5x is developed for high-resource devices with high per- objects. The YOLOv8-Seg model has a prediction layer and five
formance whereas YOLOv5s and YOLOv5n are developed for detection modules, which are similar the detection heads of
ci
low-resource devices. AP of 50.7% is achieved by YOLOv5x hav- YOLOv8. YOLOv8 has a semantic segmentation component
ing an image size of pixels in MS COCO dataset test-dev 2017. known as YOLOv8-Seg. This model has exhibited leading perfor-
er
Source code: https://github.com/ultralytics/yolov5 mance on a range of object detection and semantic segmentation
examinations, all while sustaining speedy processing and effec-
m
with a self-distillation strategy. In network designing for the con- achieved an average accuracy (AP) of 53.9% with an image size of
struction of the backbone RepBlock (Ding et al., 2021) is used for 640 pixels. This is a huge improvement compared to YOLOv5’s
small models, and CSP (Wang et al., 2020) block is used for large AP of 50.7% on the identical input size 12. YOLOv8x gets a speed
N
models. For neck (Liu et al., 2018) constructs, PAN topology (used of 280 frames per second (FPS) when running on an NVIDIA
in YOLOv5 and YOLOv6) with RepBlocks or CSPStackRep A100 with TensorRT, as indicated in the paper by Terven and
Blocks is adopted to have Rep-PAN an enhanced version of PAN Cordova-Esparza (2023). Source code: https://github.com/ultralyt-
topology. Efficient Decoupled Head is used for head construction. ics/ultralytics
For labelling task alignment learning (TAL) (Feng et al., 2021) is
considered as more efficient. In YOLOv6, we employ a hybrid- YOLOv9
channel strategy to create a more streamlined decoupled head. To Yolov9 involves the use of PGI (Programmable Gradient
be precise, we decrease the count of intermediate 3x3 convolution- Information) and a lightweight network called GELAN
al layers to just one. The head’s width is simultaneously adjusted (Generalised Efficient Layer Aggregation Network). PGI is an
by the width multiplier for both the backbone and the neck. These auxiliary supervision framework developed to solve information
adjustments effectively diminish computational expenses, result- bottleneck problems such as the loss of information during the
ing in a decreased inference latency. YOLOv6 adopts an anchor- feedforward mechanism. PGI consists of three components: main
free detector (anchor point-based) (Ge et al., 2021; Tian et al., branch, auxiliary reverse branch and multi-level auxiliary branch.
2019) where the box regression branch accurately anticipates the An auxiliary reversible branch has been implied in PGI to retain
distance from the anchor point to all four sides of the bounding the information that has been lost due to an information bottleneck.
boxes (Li et al., 2022). Source code: https://github.com/meituan/ By introducing GELAN (formed by combining CSPNet and
YOLOv6 ELAN), they improved the model’s architecture and reduced the
information bottleneck (Tishby and Zaslavsky, 2015) which gener-
ly
on
e
us
al
ci
er
m
om
Figure 2. Comparison chart of YOLOv9 with other start of art object detection.
-c
on
N
Source
- Precision
Confusion matrix
Confusion matrix is a highly famous measure utilized while
solving classification difficulties. It can be applied to binary clas-
sification as well as to multi-class classification issues. Confusion
This model has few parameters with high calculation accuracy when compared with other models
This works helps farmers to do tedious work of categorization and recognize of bird eye chillies
mechanized harvesting of Asparagus to reduce labor costs and increase production efficiency
matrices represent counts from predicted and actual values. The
Detection of weeds in different turfgrass. Such as manila grass, ryegrass and bermudagrass
From this study they found that YOLOv4 is suitable and effective for counting grains
By apple flower detection we can find the apple thinning time and predict the yield
result “TN” stands for true negative which shows the number of
negative situations identified accurately. Similarly, “TP” stands for
true positive which shows the number of positive cases identified
Key findings
users need to pass real values and expected values to the function
ly
(Kulkarni et al., 2020) (Figure 3).
on
Intersection over union (IoU)
Bounding boxes: Bounding boxes are rectangular zones that are
e
drawn around the object of interest in images. We use x and y as coor-
dinates to represent the coordinates of the bounding boxes. Object us
detection methods such as YOLO, CNN, and SSD use bounding
boxes with probabilistic classes for identified objects (Breuers et al.,
2016). Tracking of objects, instance segmentation (Hsu et al., 2019),
al
and scene understanding were done in images using bounding boxes.
ci
Accuracy 85.45%
Accuracy 93%
mAP0.5 0.70
mAP 95.4%
mAP 77.5%
mAP 95.5%
ratio of the area of interest of two bounding boxes and their area of
m
Mathematically,
-c
YOLOv5s
YOLOv5s
YOLOv3,
YOLOv4,
YOLOv5
YOLOv5
YOLOv5
YOLOv4
YOLOv5
on
Self-build dataset
Self-build dataset
Self-build dataset
Dataset used
and ground truth bounding boxes overlap, and Area of Union is the
combined region covered by both the predicted and the ground
truth bounding boxes.
Intersection over Union values range from 0 to 1. IoU of zero
indicates no overlap between the ground truth bounding boxes and
the predicted bounding boxes. IoU of one indicates the perfect
Sem blight and brown spot detection
match i.e. ground truth boxes were precisely aligned with the pre-
Tomato leaf disease identification
Weed detection
Counting and e
Grain counting
in Asparagus
stimation
detection
Quality assessment
Quality assessment
Crop monitoring
Crop monitoring
Crop monitoring
Crop monitoring
Crop monitoring
the correctness of positive predictions. This measure is alternative- with lower confidence from our list is removed.
ly referred to as the positive predictive value. Recall, also termed vii) This step is repeated until we have gone through all the boxes
sensitivity, evaluates a model’s capability to predict positive out- in the list.
comes effectively (Chen, 2021; Pedregosa et al., 2011).
A good F1 score suggests good precision and recall values
were attained.
YOLO architecture and design principles
YOLO partitions an image into a grid with dimensions S x S.
Within each grid cell, predictions are made for B bounding boxes
and their corresponding confidence levels. The confidence of an
object indicates the reliability and accuracy of the bounding box
that both identifies and classifies the object (Štancel and Hulič,
2019). The core idea guiding the detection of an object within any
grid cell is that the centre of the object must be situated inside that
TP (true positive) = the objects were detected as that object. specific grid cell. The detection of a particular object is attributed
FP (false positive) = objects other than those that were detected as to the responsibility of the grid cell, aided by an appropriate
those objects. bounding box (Diwan et al., 2023). The grid cell forecasts param-
FN (false negative) = objects were not detected as those objects. eters for a singular bounding box, with the initial five parameters
being specific to that bounding box. However, the remaining
Non-maximum suppression algorithm parameters are common to all bounding boxes within the same
Non-maximum suppression (NMS) is employed as a post-pro- grid, regardless of the bounding boxes present.
ly
cessing methodology to enhance object detection by mitigating the
occurrence of overlapping bounding boxes and enhancing overall
on
accuracy. During the object detection process, the algorithm com-
monly produces numerous bounding boxes around the desired
object, each accompanied by distinct confidence scores (Figure 6).
e
To eliminate redundant and repetitive boxes and retain only the us
most accurate ones, we utilize NMS (Hosang et al., 2017).
Subramanyam (2021)
ci
ly
m spatial resolution) and Gaofen- 2 (1 m spatial resolution) satel-
to accommodate different shapes and effectively capture various
lite images for detection of AG. All the architectures were imple-
on
objects, referred to as anchor boxes. The objective is to detect an
object in an image with a bounding box where the centre of the mented with the PyTorch framework (deep learning framework).
object lies. However, multiple object centres may fall within the The darknet model of YOLOv3 was converted to the PyTorch
framework. By adaptation of the Feature Pyramid Network (FPN)
e
same bounding box. The authors introduce the term “anchor
boxes” to denote the bounding boxes associated with a single grid and multilabel classification YOLOv3, the detection is enhanced.
us
cell. Anchor boxes constitute a set of standardized bounding boxes, Among the different architectures, YOLOv3 performed well with
by analysing the dataset and objects in it, the anchor boxes were mAP (GF-1 and GF-2) of 90.4% with an FPS of 73. They conclud-
ed that to increase the detection quality we need to increase the
al
chosen. These selected anchor boxes aim to encompass most class-
es/categories by considering diverse combinations of width and spatial resolution of the input images.
ci
height, such as vertical, square, or horizontal rectangles, etc. This Tundia et al. (2020) in their studies detected minor irrigation
ensures the representation of various aspect ratios and scales for all structures using Google Satellite images. They compared the speed
er
objects present in the dataset. and accuracy of Faster R-CNN, YOLOv3, Tiny YOLOv3, and
The CNN demonstrates remarkable performance in extracting RetinaNet. From this Tiny YOLOv3 has the least inference time
m
features from visual input by efficiently transmitting low-level fea- among the other architectures due to its reduced convolutional
layer but its accuracy is reduced (Tables 2 and 3).
om
detection problem is facilitated by two essential CNN features: Peninsula, North Africa, and the Middle East (Jintasuttisak et al.,
on
parameter sharing and the use of multiple filters. 2022) used state-of-the-art YOLOv5 (small, medium, large, and
N
extra-large), YOLOv3, YOLOv4, and SSD300 are used. They ran- YOLOv7 and obtained mAP@0.5, precision and recall of 56.6%,
domly selected 125 images captured using an RGB drone camera 61.3%, and 62.1% respectively using CP datasets. Using the LB
from which they used 60% for training, 20% for validation, and dataset, they obtained mAP, mAP for weeds, and mAP for sugar
20% for testing then applied data augmentation which increased beets from 51% to 61%, 67.5% to 74.1%, and 34.6% to 48%. For
the range of the training dataset by five times. From their studies, spraying weedicide, Narayana and Ramana (2023) developed
they concluded that YOLOv5m (medium CNN depth) has per- object detection using YOLOv7 which trained using two datasets
formed better than other architecture with mAP of 92.34% and are early crop weed detection dataset (contains 308 images) and
YOLOv5s has less training time (11.33 ms) because of their small the 4weed dataset (contains 618 RGB images). They used 90% of
CNN network. Nurhabib and Seminar (2022) identified and count- the dataset for training and 10% for the testing set. The model was
ed oil palm trees using YOLO with Citra satellite series (1, 2, 3) trained and tested in Google Colab which is a cloud-based environ-
images. Özer et al. (2022) carried out an inter-comparative analy- ment. mAP of 99.6% was obtained for the Early Weed dataset and
sis of YOLOv5 where they compared the results of YOLOv5s, 78.53% mAP was obtained for the 4weed dataset.
YOLOv5m, and YOLOv5x for the detection of cherry trees in
Afyonkarahisar. A total of 889 images were obtained and 80% Fruit detection
were used for training rest for testing. YOLOv5s model performed Kumar and Kumar (2023) used a new approach to object
well and obtained precision, recall, and F1 scores of 0.983, 0.978, detection applying a multi-head attention mechanism and depth
and 0.980 respectively. Palm tree detection was carried out by values to YOLOv7 for the detection of apples in an orchard. The
Ariyadi et al. (2023) using 500 UAV images. The detection is car- input data was acquired through DJI Mavic mini 3 and images
ried out using YOLOv7 with and precision of 98.5%, recall of from the video were extracted and then annotated with depth label
98.17%, overall accuracy of 98.31%, and mean average precision creation and augmentations such as image mirroring, blurring of
of 99.7%. For training, they used 80% of the data, and 20% of the image, noisy image, etc we have done on the input. This modified
ly
data was used for testing. For each image, the detection time YOLOv7 consists of three detection heads which also help to
ranged from 17 ms to 18.4 ms. detect the depth of the apple in the orchard, which is further used
on
Monitoring forests enables us to tackle the loss of biodiversity to estimate distribution and density. In the end, YOLOv7 couldn’t
in forest ecosystems and tackle the effects of climate change. be able to identify all apples while detection but the modified
Straker and colleagues (2023) in their studies counted the number YOLOv7 (i.e., multi-head detection mechanism) detected almost
e
of trees and segmented the tree crowns using YOLOv5 and all apples which gave precision, recall, and F1 scores of 0.91, 0.96,
Tessellation approach. They used the “For Instance” dataset which
consists of 4192 annotated images. The YOLO model performed
us
and 0.92, respectively. For better marketing, ripeness is an impor-
tant factor for tomatoes. Thus, tomatoes need to be harvested in the
27% and 34% better than the Individual tree crown approach at correct stage. For this, (Appe et al., 2023) used a modified version
al
point densities of 50 and 10 points m-2 respectively. of YOLO called CAM – YOLO which used YOLOv5 for detecting
In countries such as India transmission lines passes through ripened tomatoes using convolutional block attention model
ci
cultivation lands. It is important to monitor these transmission (CBAM). By this, they achieved an accuracy of 88.1% and per-
lines to avoid damage by trees growing under them. Xu et al.
er
4688, 1113, 2336, 2195 and 290 labels, respectively. The images detection of small objects because of the replacement of the C3
were collected through drones mounted with an MS600 pro multi- model in YOLOv8 from the C2f model used in YOLOv5 (Sohan et
spectral camera. They also applied image augmentations such as al., 2024). For annotation, they used RoboFlow and divided the
flipping, random cropping, colour dithering, rotation, scaling and
-c
dataset into 6:2:2 ratios for training, validation, and testing respec-
affine transformation. Using three different band combinations, tively. An image resolution of 640 x 640 is used for training for the
on
i.e., R-G-B, NIR-R-G, NIR-G-B the images were inputted. From development of the YOLOv8m and MobileNetv2 models. They
this, YOLOv7 achieved an average accuracy of 75.77%. from the achieved 95.76%, 95.74%, and 95.75% of precision, recall, and
different band combinations RGB composition acquired higher F1-Score respectively for YOLOv8m and MobileNetv2 models.
N
mean mAP. Fukada et al. (2023) used YOLOv5 (pre-trained using the
COCO 2017 dataset) to analyse tomato growth using industry
Weed detection camera devices. This implementation of YOLOv5-based object
Etienne et al. (2021) used YOLOv3 for the identification of detection reduced the effort required to analyse crop growth by
monocot and dicot weeds in the fields of corn and soybean 80%. Lawal (2021) detected tomatoes in complex environments
research plots. They created four different training image sets with using YOLO-Tomato (a modified version of YOLOv3). They
images acquired from 10 m above ground level (AGL), 30 m AGL, divide the models into three types. Such as YOLO-Tomato-A,
30 m and 10 m AGL, and 10 m GL with only dicot weeds. The YOLO-Tomato-B and YOLO-Tomato-C. YOLO-Tomato-C has a
obtained images were reduced to 416 x 416 pixels before training. mish activation function with a front detection layer (FDL) and
Weed instances of 25,560 were manually annotated. 91.48% and SPP outperformed the other two types by producing an AP of
86.13% of average precision (AP) scores were obtained at a thresh- 99.5%. The use of SPP results in improved AP of the model com-
old of 0.25. pared to the other two models. Fruits such as bananas, apricots,
Gallo et al. (2023) used UAV images due to their flexibility of apples, and strawberries ripen faster than other fruits. Detection of
data acquisition and high-resolution capability and created 12,113 ripened strawberries in fields by traditional methods is time-con-
bounding box annotations from 3000 collected RGB images suming and results in spoilage of fruits. An et al. (2022) developed
through UAV. In their studies they used two datasets; one is specif- a strawberry growth detection algorithm based on YOLOX.
ically developed for chicory plantations called the chicory plant Though the model size remains the same as YOLOX, it has 3.64%,
(CP) and another one is lincoln beet (LB). For detection, they used 2.04% and 4.08% higher accuracy, recall and precision respective-
ly. This model also solves problems such as the low accuracy of an average inference time of 1.563 ms and a model size of 2 MB.
models at complex environments. Chen et al. (2023) has overcome Madhurya and Jubilson (2023) detected and classified plant
the dense and occluded grape detection and missing detection of leaf disease using the YOLOv7 framework called YR2S (YOLO-
grapes by developing a lightweight model called GA-YOLO. In Enhanced Rat Swarm Optimizer - Red Fox Optimization (RFO-
this model, SE-CSPGhostnet is designed and introduced in the ShuffleNetv2)). They used PCFAN for the generation of feature
backbone with 82.79% reduced parameters. It has a mAP of maps. The model was detected and classified with a high accuracy
96.87% and a detection speed of 55.867 FPS. Using artificial intel- of 99.69%. Bandi et al. (2023) used YOLOv5 for leaf disease and
ligence as a classifier and cameras as sensors (Chen M.-C. et al., used U2-Net to remove the background of the affected leaf. They
2022) identified the external quality of fruits such as apples, also used a vision transformer for classifying the disease into dif-
oranges and lemons based on size, height, width, etc. This reduces ferent stages such as high, medium, and low. They used open
the labour intensiveness and improves the work speed. They used datasets like PlantDoc and Plant Village. They achieved an F1
the YOLOv3 algorithm for fruit detection and acquired an accura- score of 0.57 and a confidence score of 0.2 for YOLOv5 in disease
cy of 88% by testing on 6000 images. Detecting cherry fruits in detection. Bachhal et al. (2023) in their studies used CCN+YOLO
open environments results in reduced accuracy due to shading. compared with other models for the detection of maize plant dis-
Thus, Gai et al. (2023) introduced an improved version of ease. They used the Plant Village dataset with 100 images of com-
YOLOv4 called YOLOv4-dense which has a modified backbone mon rust, 50 images of southern rust, 30 images of maize leaf
of CSPDarknet53 combined with DenseNet. Image augmentation blight, 30 images of turcicum leaf blight, 70 images of grey leaf
such as flipping, zooming, colour gamut changing, etc., were spot, and 90 health leaf images. To detect verticillium fungus in
applied on input images. Also, they changed the rectangular olive trees, Mamalis et al. (2023) different models of YOLOv5
bounding boxes into circular bounding boxes. By this the algo- such as nano, medium, and small. For annotation they used the
rithm’s speed is increased and feature extraction is also improved. LabelImg package and classified them as healthy and damaged has
ly
This model produced 0.15 higher mAP than YOLOv4. withered effect. These images were trained in two image sizes
With the help of computer vision, we can reduce input costs, 1216 x 1216 and 640 x 640. The YOLOv5m with model input of
on
and labour costs and increase production efficiency. Gremes et al. 640 x 640 size outperformed other models in their studies. They
(2023) counted green oranges directly from trees with green leaf concluded that as the input size decreases and increases in model
backgrounds using YOLOv4. The performance of used YOLOv4 capacity, the performance increases. Pine Wilt Disease (PWD) is
e
model was compared with an optimal object detector model, where one of the most dangerous diseases in forest regions because of its
in the captured video each orange were detected frame by frame.
Thus, by combining these two techniques double-counting errors
us
rapid spread and management challenges. Traditional methods
have more challenges such as excessive time consumption and
were reduced and the detected and actual oranges were almost poor accuracy. Detection of PWD in forest regions helps policy-
al
equal. The algorithm obtained an mAP50, mAP50:95, precision, makers to manage the situation based on the results. Zhu et al.
recall, F1-score, average IoU of 80.16, 53.83, 0.92, 0.93, 0.93 and (2024) used YOLOv7-SE for the detection of PWD from high-res-
ci
82.08%, respectively. olution helicopter images. The model achieved a precision rate of
0.9281, F1 score of 0.9117 and a recall of 0.8958. Similarly, Wu et
er
Disease detection al. (2024) used YOLOv3 for detecting PWD from UAV images.
They used the CIoU loss function for detecting forest pests and dis-
m
Phantom 4 equipped with RTK technology in regions of eastern Sri (modified version of YOLOv8) which identifies PWD. This model
Lanka. The obtained images were augmented using Python aug- mAP@0.5 at 90.69%, mAP@0.5:0.95 at 49.72%, recall at 85.72%,
mentor package 0.2.9. 1200, 240, and 240 images were used for precision at 91.31% and F1-score at 88.43%.
-c
algorithms in precision, mAP@0.5. mAP@0.95 and has a very Espinoza-Hernández et al. (2023) determined agave plant den-
small model size of 14MB when compared to YOLOR, DETR, and sity using high-resolution RGB images captured through remote
Faster R-CNN. Amarasingam et al. (2022) conducted object detec- pilot drones. They used YOLOv4 and YOLOv4 tiny for accurate
N
tion using XGB, RF, DT, and KNN in the same fields and obtained detection at different phenological stages and produced a mean
very little accuracy than YOLOv5. Mathew and Mahesh (2022) average accuracy of 0.99 for both architectures with 0.95 and 0.96
used YOLOv3 for disease detection in apples. They identified dis- F1 Score for YOLOv4 and YOLOv4 tiny respectively. Qin et al.
eases visible in apple tree leaves such as black rot, cedar rust, and (2021) developed an algorithm from YOLO called Ag-YOLO
apple scab. they classified the image dataset into four classes, and which was operated in NCS2(Intel Neural Compute Stick 2). They
for each class for training and testing, they utilized 1500 and 500 also compared the developed model with YOLOv3 – Tiny. It is
images respectively. At the 700th iteration, they get an average loss seen that Ag- YOLO outperformed YOLOv3 – Tiny producing a
of 0.6010. (da Silva et al., 2023) their studies for the detection of higher accuracy of 0.9205 (F1 Score) and a higher FPS of 36.5
diseases in Citrus used YOLOv3 and Faster RCNN for detection which is two times faster than Tiny YOLOv3 using 12x fewer
tasks and concluded YOLO was faster than Faster R-CNN which parameters. Counting rice seedlings traditionally is time-consum-
utilizes less computation power when compared to Faster RCNN. ing and labour-intensive leading to errors. Yeh et al. (2024) devel-
They used LabelImg (Tzutalin, 2015). YOLOv3 and faster R-CNN oped a YOLO-based approach for counting and marking the loca-
were run on Keras back-end and evaluated using mAP. While tion of rice seedlings in the field using a UAV UAV-based
detection they used GPS of mobile to map how the infection spread approach. In their studies, they used YOLOv4 for counting the
through the orchard spatially. To detect crop leaf diseases, Dai and seedlings. Though YOLO models are weak in detecting small
Fan (2022) used YOLOv5- CAcT and Plant Village and AI objects they made changes in images by making data augmentation
Challenger datasets. The model achieved an accuracy of 94.24% (image cropping) and changes in the activation function. They
and achieved 59 crop disease categories and 10 crop species with implemented the Mish function to improve the accuracy of archi-
tecture. They utilized the UAV dataset provided by AIdea (25 rice upcoming days. We can make the YOLO algorithm use RGB
images with a resolution of 3000 x 2000, 19 images with a resolu- images as well as multispectral bands to analyse chlorophyll con-
tion of 2304 x 1728). The experiment was conducted in six models, tent and monitor the stress condition of crops in real-time
and it was found that model 6 (modified YOLOv4 with mish acti- (Thomson and Sullivan, 2006). Increased fertilizer application
vation function) had given accuracy of 0.97, an average precision results in wastage of input and has adverse effects on the environ-
of 0.917, and an F1-score of 0.91. Wang Y. et al. (2023) in their ment. Using YOLO, we can apply fertilizers to specified crops
studies proposed a YOLOv5-AC model for detecting the efficiency through IoT technology. By this, we can reduce the input cost,
of uncrewed rice transplanters. The model achieved an accuracy of reduce the wastage of the raw materials, and protect the environ-
95.8% and F1 score of 93.39%. Lu et al. (2023) modified YOLOv8 mental impact. Weeds play a crucial role in the agricultural field
for UAV-based object detection and developed a model for precise since they result in reduced yield of crops due to nutrient uptake by
agriculture. When compared with YOLOv8-N, this model per- them (Nath et al., 2024). In some studies, weeds were detected and
formed well by obtaining 0.921,0.883,0.937and 0.565 precision, management practices using YOLO algorithms. By implementing
recall, AP50 and AP50:95 respectively. Pu et al. (2023) used a YOLO, we need to distinguish between crops and weeds to apply
modified version of YOLOv7 called Tassel-YOLO which used site-specific management practices such as applying weedicides.
GSConv and VoVGSCSP module in the neck part and SIoU loss By collecting high-resolution images of croplands through drones
function in the head part. Tassel-YOLO achieves 96.14% we can predict the yield and plan harvesting. The data obtained can
mAp@0.5, with a counting accuracy of 97.55%. They used the be integrated with weather data, satellite imagery, and crop models
global attention mechanism (GAM) (Liu et al., 2021) which to create a decision support system for farmers (Table 4).
improves the feature representation ability through channel atten-
tion and the accuracy of spatial data through spatial attention
(Wang et al., 2018). Images were acquired using a DJI Mavic
ly
drone and the image resolution was reduced to 640 x 640 during Conclusions
the detection phase. Due to a lack of knowledge and experience,
on
In agriculture, YOLO has been used for crop detection (Sneha
coffee farmers find it difficult to harvest coffee fruits at the time of
et al., 2024), fruit detection (Appe et al., 2023), and pest and dis-
harvest. Bazame et al. (2022) detected and classified coffee fruits
ease detection (Amara et al., 2023). However, the base YOLO
into unripe(green), overripe(dry) and ripe(cherry) using YOLO.
e
algorithm struggles with identifying small objects, which poses
They used YOLOv3 and YOLOv4 for detection and classification.
challenges for detecting crops like rice, sorghum, and maize. To
us
YOLOv4 and YOLOv4-tiny models performed well and obtained
address this, modified versions like Tassel-YOLO for maize tassel
mAP of 81% and 79%. Camacho and Morocho-Cayamcela (2023)
detection (Pu et al., 2023), Ag-YOLO for broader agricultural
used YOLOv8 for the segmentation and detection of tomatoes at
studies (Qin et al., 2021), and a modified YOLOv4 for cherry
al
different maturity stages. YOLOv8 produced an R2 of
detection (Gai et al., 2023) have been developed. Further enhance-
0.809,0.897,0.968 in ripe, half-ripe and green categories respec-
ci
ly
Amarasingam, N., Gonzalez, F., Salgadoe, A.S.A., Sandino, J., using the improved YOLOv5. Agronomy (Basel) 12:2483.
Powell, K. 2022. Detection of white leaf disease in sugarcane Cheng, L., Li, J., Duan, P., Wang, M. 2021. A small attentional
on
crops using UAV-derived RGB imagery with existing deep YOLO model for landslide detection from satellite remote
learning models. Remote Sens. (Basel) 14:6137. sensing images. Landslides 18:2751-2765.
An, Q., Wang, K., Li, Z., Song, C., Tang, X., Song, J. 2022. Real- Cowton, J., Kyriazakis, I., and Bacardit, J. 2019. Automated indi-
e
time monitoring method of strawberry fruit growth state based vidual pig localisation, tracking and behaviour metric extrac-
on YOLO improved model. IEEE Access 10:124363-124372.
Appe, S.N., Arulselvi, G., Balaji, G. 2023. CAM-YOLO: tomato
us tion using deep learning. IEEE Access 7:108049-108060.
da Silva, J.C., Silva, M.C., Luz, E.J., Delabrida, S., Oliveira, R.A.
detection and classification based on improved YOLOv5 using 2023. Using mobile edge AI to detect and map diseases in cit-
al
combining attention mechanism. PeerJ Comp. Sci. 9: e1463. rus orchards. Sensors (Basel) 23:2165.
Ariyadi, M.R.N., Pribadi, M.R., Widiyanto, E.P. 2023. Unmanned Dai, G., Fan, J. 2022. An industrial-grade solution for crop disease
ci
aerial vehicle for remote sensing detection of oil palm trees image detection tasks. Front. Plant Sci. 13:921057.
er
using you only look once and convolutional neural network. Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J. 202).
10th Int. Conf. on Electrical Engineering, Computer Science Repvgg: Making vgg-style convnets great again. IEEE/CVF
m
and Informatics (EECSI), Palembang. pp. 226-230 Conf. on Computer Vision and Pattern Recognition, Nashville.
Bachhal, P., Kukreja, V., Ahuja, S. 2023. Real-time disease detec- pp. 13728-13737.
om
tion system for maize plants using deep convolutional neural Diwan, T., Anirudh, G., Tembhurne, J.V. 2023. Object detection
networks. Int. J. Comput. Dig. Syst. 14:10263-10275. using YOLO: Challenges, architectural successors, datasets
Bandi, R., Swamy, S., Arvind, C. 2023. Leaf disease severity clas- and applications. Multimed. Tools Appl. 82:9243-9275.
-c
sification with explainable artificial intelligence using trans- Dollár, P., Appel, R., Belongie, S., Perona, P. 2014. Fast feature
former networks. Int. J. Adv. Technol. Eng. Explor. 10:278. pyramids for object detection. IEEE T. Pattern Anal. 36:1532-
on
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J. 2021. Yolox: Exceeding IEEE Access.
yolo series in 2021. arXiv: 2107.08430. Madhurya, C., and Jubilson, E. A. (2023). YR2S: efficient deep
Girshick, R. 2015. Fast r-cnn. IEEE Int. Conf. on Computer Vision, learning technique for detecting and classifying plant leaf dis-
Santiago. pp. 1440-1448, eases. IEEE Access 11:116196-116205
Gremes, M.F., Fermo, I.R., Krummenauer, R., Flores, F.C., Mamalis, M., Kalampokis, E., Kalfas, I., Tarabanis, K. 2023. Deep
Gonçalves Andrade, C.M., da Motta Lima, O.C. 2023. System learning for detecting verticillium fungus in olive trees: Using
of counting green oranges directly from trees using artificial YOLO in UAV imagery. Algorithms (Basel) 16:343.
intelligence. AgriEngineering (Basel) 5:1813-1831. Mathew, M.P., Mahesh, T.Y. (2022). Leaf-based disease detection
Hamidisepehr, A., Mirnezami, S.V., Ward, J.K. 2020. Comparison in bell pepper plant using YOLO v5. Signal Image Video P.
of object detection methods for corn damage assessment using 16:841-847.
deep learning. T. ASABE 63:1969-1980. Narayana, C.L., Ramana, K.V. 2023. An efficient real-time weed
Haque, M.E., Rahman, A., Junaeid, I., Hoque, S.U., Paul, M. 2022. detection technique using YOLOv7. Int. J. Adv. Comput. Sci.
Rice leaf disease classification and detection using yolov5. Appl. 14:550-556.
arXiv: 2209.01579. Nath, C.P., Singh, R.G., Choudhary, V.K., Datta, D., Nandan, R.,
Hobbs, J., Khachatryan, V., Anandan, B.S., Hovhannisyan, H., Singh, S.S. (2024). Challenges and alternatives of herbicide-
Wilson, D. 2021. Broad dataset and methods for counting and based weed management. Agronomy (Basel) 14:126.
localization of on-ear corn kernels. Front. Robot. AI 8:627009. Nugroho, D.P., Widiyanto, S., Wardani, D.T. 2022. Comparison of
Hosang, J., Benenson, R., Schiele, B. 2017. Learning non-maxi- deep learning-based object classification methods for detecting
mum suppression. IEEE Conf. Computer Vision and Pattern tomato ripeness. Int. J. Fuzzy Logic Intell. Syst. 22:223-232.
Recognition, Honolulu. pp. 6469-6477. Nurhabib, I., Seminar, K. 2022. Recognition and counting of oil
Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y. 2019. palm tree with deep learning using satellite image. IOP Conf.
ly
Weakly supervised instance segmentation using the bounding Ser. Earth Environ. Sci. 974:012058.
box tightness prior. 33rd Conf. Neural Information Processing Ohnemüller, L., Briassouli, A. 2021. Improving accuracy and effi-
on
Systems (NeurIPS 2019), Vancouver. ciency in plant detection on a novel, benchmarking real-world
Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B. 2022. A review of Yolo dataset. IEEE Int.Workshop on Metrology for Agriculture and
algorithm developments. Procedia Comput. Sci. 199:1066- Forestry (MetroAgriFor), Trento-Bolzano. pp. 172-176.
e
1073. Özer, T., Akdoğan, C., Cengız, E., Kelek, M.M., Yildirim, K.,
Jintasuttisak, T., Edirisinghe, E., Elbattay, A. 2022. Deep neural
network based date palm tree detection in drone imagery.
us
Oğuz, Y., Akkoç, H. 2022. Cherry tree detection with deep
learning. IEEE Conf. on Innovations in Intelligent Systems and
Comput. Electron. Agr. 192:106560. Applications (ASYU), Antalya. pp. 1-4.
al
Kulkarni, A., Chong, D., Batarseh, F.A. 2020. Foundations of data Papageorgiou, C.P., Oren, M., Poggio, T. 1998. A general frame-
imbalance and solutions for a data democracy. In: Feras A. work for object detection. IEEE 6th Int. Conf. on Computer
ci
Batarseh, Yang R. (eds.), Data democracy. Cambridge, Vision, Bombay. pp. 555-562.
er
Academic Press. pp. 83-106. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion,
Kumar, P., Kumar, N. 2023. Drone-based apple detection: Finding B., Grisel, O., et al. 2011. Scikit-learn: Machine learning in
m
the depth of apples using YOLOv7 architecture with multi- Python. J. Machin. Learning Res. 12:2825-2830.
head attention mechanism. Smart Agr. Technol. 5:100311. Pu, H., Chen, X., Yang, Y., Tang, R., Luo, J., Wang, Y., Mu, J.
om
Lawal, M.O. 2021. Tomato detection based on modified YOLOv3 2023. Tassel-YOLO: A new high-precision and real-time
framework. Sic. Rep. 11:1477. method for maize tassel detection and counting based on UAV
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., et al. 2022. aerial images. Drones 7:492.
-c
YOLOv6: A single-stage object detection framework for Qin, Z., Wang, W., Dammer, K.-H., Guo, L., Cao, Z. 2021. Ag-
industrial applications. arXiv: 2209.02976. YOLO: A real-time low-cost detector for precise spraying with
on
Li, M., Zhang, Z., Lei, L., Wang, X., Guo, X. 2020. Agricultural case study of palms. Front. Plant Sci. 12:753603.
greenhouses detection in high-resolution satellite images based Qing, Y., Liu, W., Feng, L., Gao, W. 2021. Improved Yolo network
on convolutional neural networks: Comparison of faster R- for free-angle remote sensing target detection. Remote Sens.
N
Roy, A.M., Bhaduri, J., Kumar, T., Raj, K. 2023. WilDect-YOLO: Wang, C.-Y., Liao, H.-Y. M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W.,
An efficient and robust computer vision-based accurate object Yeh, I.-H. 2020. CSPNet: A new backbone that can enhance
localization model for automated endangered wildlife detec- learning capability of CNN. IEEE/CVF Conf. on Computer
tion. Ecol. Inform. 75:101919. Vision and Pattern Recognition, Seattle. pp. 1571-1580
Sneha, N., Sundaram, M., Ranjan, R. 2024. Acre-scale grape Wang, C.-Y., Yeh, I.-H., Liao, H.-Y. M. 2024. YOLOv9: learning
bunch detection and predict grape harvest using YOLO deep what you want to learn using programmable gradient informa-
learning network. SN Comput. Sci. 5:250. tion. arXiv: 2402.13616.
Sohan, M., Sai Ram, T., Reddy, R., Venkata, C. 2024. A review on Wang, C., Wang, C., Wang, L., Wang, J., Liao, J., Li, Y., Lan, Y.
YOLOv8 and its advancements. Int. Conf. on Data Intelligence 2023. A lightweight cherry tomato maturity real-time detection
and Cognitive Informatics. pp 529-545 algorithm based on improved YOLOV5n. Agronomy (Basel)
Song, C., Wang, C., Yang, Y. 2020. Automatic detection and image 13:2106.
recognition of precision agriculture for citrus diseases. IEEE Wang, H., Fan, Y., Wang, Z., Jiao, L., Schiele, B. 2018. Parameter-
Eurasia Confe. on IOT, Communication and Engineering, free spatial attention network for person re-identification.
Yunlin, Taiwan. pp. 187-190. arXiv: 1811.12150.
Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S. 2011. Wang, Y., Fu, Q., Ma, Z., Tian, X., Ji, Z., Yuan, W., et al. 2023.
Contextualizing object detection and classification. IEEE T. YOLOv5-AC: a method of uncrewed rice transplanter working
Pattern Anal. 37:13-27. quality detection. Agronomy (Basel) 13:2279.
Sportelli, M., Apolo-Apolo, O.E., Fontanelli, M., Frasconi, C., Wang, Z., Hua, Z., Wen, Y., Zhang, S., Xu, X., Song, H. 2024. E-
Raffaelli, M., Peruzzi, A., Perez-Ruiz, M. 2023. Evaluation of YOLO: Recognition of estrus cow based on improved
YOLO object detectors for weed detection in different turf- YOLOv8n model. Expert Syst. Appl. 238:122212.
grass scenarios. Appl. Sci. 13:8502. Wiggers, K.L., Pohlod, C.D., Orlovski, R., Ferreira, R., Santos,
ly
Štancel, M., Hulič, M. 2019. An introduction to image classifica- T.A. 2022. Detection and counting of plants via deep
tion and object detection using YOLO detector. Proc. CEUR learningusing images collected by RPA. Rev. Bras. Cien. Agr.
on
Workshop. 17:1.
Straker, A., Puliti, S., Breidenbach, J., Kleinn, C., Pearse, G., Wu, D., Lv, S., Jiang, M., Song, H. 2020. Using channel pruning-
Astrup, R., Magdon, P. 2023. Instance segmentation of individ- based YOLO v4 deep learning algorithm for the real-time and
e
ual tree crowns with YOLOv5: A comparison of approaches accurate detection of apple flowers in natural environments.
using the ForInstance benchmark LiDAR dataset. ISPRS Open
J. Photogramm. Remote Sens. 9:100045.
us Comput. Electron. Agr. 178:105742.
Wu, Y., Yang, H., Mao, Y. 2024. Detection of the pine wilt disease
Subramanyam, V.S. 2021. Non Max Suppression (NMS). using a joint deep object detection model based on drone
al
Available from: https://medium.com/analytics-vidhya/non- remote sensing data. Forests (Basel) 15:869.
max-suppression-nms-6623e6572536 Xiao, Y., Tian, Z., Yu, J., Zhang, Y., Liu, S., Du, S., Lan, X. 2020.
ci
Sulemane, S., Matos-Carvalho, J.P., Pedro, D., Moutinho, F., A review of object detection based on deep learning. Multim.
er
Correia, S.D. 2022. Vineyard gap detection by convolutional Tools Appl. 79:23729-23791.
neural networks fed by multi-spectral images. Algorithms Xu, S., Wang, R., Shi, W., Wang, X. 2023. Classification of tree
m
volutional neural network model for medical image segmenta- Yao, J., Song, B., Chen, X., Zhang, M., Dong, X., Liu, H., et al.
tion. J. Healthc. Eng. 2019:8597606. 2024. Pine-YOLO: a method for detecting pine wilt disease in
Terven, J., Cordova-Esparza, D. 2023. A comprehensive review of unmanned aerial vehicle remote sensing images. Forests
-c
Thomson, S.J., Sullivan, D.G. 2006. Crop status monitoring using counting and location labeling of rice seedlings from
multispectral and thermal imaging systems for accessible aeri- unmanned aerial vehicle images. Electronics (Basel) 13:273.
al platforms. 2006 ASAE Annual Meeting 061179. Yin, S., Zhang, Y., Karim, S. 2018. Large scale remote sensing
N
Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E., Liang, Z. 2019. image segmentation based on fuzzy region competition and
Apple detection during different growth stages in orchards Gaussian mixture model. IEEE Access 6:26069-26080.
using the improved YOLO-V3 model. Comput. Electron. Agr. Yin, S., Zhang, Y., Karim, S. 2019. Region search based on hybrid
157:417-426. convolutional neural network in optical remote sensing
Tishby, N., Zaslavsky, N. 2015. Deep learning and the information images. Int. J. Distrib. Sensor N. 15:1550147719852036.
bottleneck principle. IEEE Information Theory Workshop, Yu, J., Zhang, C., Wang, J., Zhang, M., Zhang, X., Li, X. 2023.
Jerusalem. pp. 1-5 Research on asparagus recognition based on deep learning.
Tundia, C., Tank, P., Damani, O.P. 2020. Aiding irrigation census IEEE Access 11:117362-117367.
in developing countries by detecting minor irrigation structures Zakria, Z., Deng, J., Kumar, R., Khokhar, M.S., Cai, J., Kumar, J.
from satellite imagery. Proc. 6th Int. Conf. on Geographical 2022. Multiscale and direction target detecting in remote sens-
Information Systems Theory, Applications and Management. ing images via modified YOLO-v4. IEEE J. Sel. Top. Appl.
pp. 208-215. 15:1039-1048.
Tzutalin, D. 2015. tzutalin/labelImg. Available from: Zhang, H., Cloutier, R.S. 2021. Review on one-stage object detec-
https://github.com/tzutalin/labelImg tion based on deep learning. EAI Endor. T. e-Learning 7:e5.
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M. 2023. YOLOv7: Zhang, L., Lin, L., Liang, X., He, K. 2016. Is faster R-CNN doing
Trainable bag-of-freebies sets new state-of-the-art for real- well for pedestrian detection? ECCV 2016. Lecture Notes in
time object detectors. IEEE/CVF Conf. on Computer Vision Computer Science, vol 9906. Springer, Cham. pp 443-457.
and Pattern Recognition, Vancouver. pp. 7464-7475. Zhang, P., Li, D. 2022. YOLO-VOLO-LS: a novel method for vari-
ety identification of early lettuce seedlings. Front. Plant Sci. Pattern Recognition, Salt Lake City. pp. 528-537.
13:806878. Zhu, S., Ma, W., Wang, J., Yang, M., Wang, Y., Wang, C. 2023.
Zhong, Z., Jin, L., Xie, Z. 2015. High performance offline hand- EADD-YOLO: An efficient and accurate disease detector for
written chinese character recognition using googlenet and apple leaf using improved lightweight YOLOv5. Front. Plant
directional feature maps. 3rd Int. Conf. on Document Analysis Sci. 14:1120724.
and Recognition, Tunis. pp. 846-850. Zhu, X., Wang, R., Shi, W., Liu, X., Ren, Y., Xu, S., Wang, X.
Zhou, P., Ni, B., Geng, C., Hu, J., Xu, Y. 2018. Scale-transferrable 2024. Detection of pine-wilt-disease-affected trees based on
object detection. IEEE/CVF Conf. on Computer Vision and improved YOLO v7. Forests (Basel) 15:691.
ly
on
e
us
al
ci
er
m
om
-c
on
N