You Only Look Once Model-Based Object Identification in Computer Vision
You Only Look Once Model-Based Object Identification in Computer Vision
Shiva Shankar Reddy1, Venkata Rama Maheswara Rao2, Priyadarshini Voosala1, Silpa Nrusimhadri2
1
Department of Computer Science and Engineering, Sagi Rama Krishnam Raju Engineering College (A), Bhimavaram, India
2
Department of Computer Science and Engineering, Shri Vishnu Engineering College for Women (A), Bhimavaram, India
Corresponding Author:
Shiva Shankar Reddy
Department of Computer Science and Engineering, Sagi Rama Krishnam Raju Engineering College
Bhimavaram, West Godavari (District), Andhra Pradesh, India
Email: [email protected]
1. INTRODUCTION
The computer vision enhancement identifies and locates at least one compelling target from still
picture or video information. Image processing, pattern acknowledgment, and artificial intelligence (AI) are
among the techniques covered. The technology has a wide range of potential uses, including traffic
management, accident prediction, crowd analysis, detection of dangerous substances in factories, optical
character recognition, autonomous vehicles, facial and iris recognition for verification, robotics, object
tracking and counting, monitoring restricted military areas, and advanced human-computer collaboration [1].
Given the intricate and unpredictable nature of identifying many target application scenarios, achieving an
optimal balance between accuracy and computational costs in real-world situations is challenging. Several
approaches have been proposed to overcome this problem, notably using computer vision and deep learning
methodologies [2].
You only look once (YOLO) is a realtime item identification system. It recognizes objects faster and
more accurately. It can estimate up to 80 kinds of visible and invisible objects [3]. The realtime recognition
system could frame a confined-edge box around adjacent items, recognize numerous things from a single
picture, and be quickly taught and deployed in a production system. It improves, speeds up, and adapts
computer vision algorithms by advancing object detection research. In Figure 1, the objects belonging to
different categories, like person and train, are detected along with their confidence score. You only look once
version 4 (YOLOv4) surpasses prior methods in terms of detection accuracy, performance, and speed [4]. It
is a rapidly functioning component that can be readily identified and taught for use in industrial processes.
In addition to identifying different objects, they are trimmed and stored in their respective directory.
The individual and the locomotive shown in Figure 2 have been trimmed and stored inside a designated
directory. The major objective was to enhance the efficiency of the neural network detector for simultaneous
computations. Additionally, it encompasses a range of potential designs and architectural decisions, taking
into account the impact on the performance of different detectors, as suggested by previous YOLO
models [5]. This technique predicts the classes and bounding boxes for the whole image instead of picking an
attractive region of interest (ROI), allowing for faster detection.
Figure 1. Object detection along with Figure 2. Cropping of different objects present
confidence score in the image
2. RELATED WORK
Yang et al. [1] suggested AIRCRAFT-YOLOv4 object localization computation that replaces the
unit convolution with a depth-wise separable convolution. They examined the AIRCRAFT-YOLOv4
computation using the UCAS-AOD dataset. Aircraft YOLOv4 can recognize airplanes in remote sensing
photos with 86.92% mAP and 29.62 FPS. The model indicated that Aircraft-YOLOv4 is better for military
remote sensing picture aircraft object locating jobs because of its high speculation. Kumar et al. [2] used tiny
YOLO v4 with a spatial pyramid pooling (SPP) module to construct a face coverings identification network
model. They assured that the provided network model is ready for precise cloak area on the face district,
increasingly reconnaissance applications where the detectable quality of the entire face locality is a criterion.
Wang et al. [3] trained and tested on Karlsruhe Institute of Technology and Toyota Technological Institute
(KITTI) and berkeley deep drive (BDD) datasets to reach their objective. They employed a YOLOv4-based
object recognition computation with a single step to boost identification accuracy and maintain support for
continuing action. Content security policy (CSP) structure in highlight fusion and the remote frame buffer
(RFB) module in the realtime object detector increases accuracy. Realtime object detector has 92.5%
accuracy for KITTI and 93.01% for BDD. Lee et al. [4] utilized multiple object tracking (MOT) 17-05. The
transparencies object identification model considers the video stream's features and designs a low-overhead
scheduling approach to choose the optimum deep neural networks (DNN) on the fly for each video outline to
improve recognition accuracy.
Bochkovskiy et al. [5] built a production system object detector with a high operational speed and
optimization for parallel calculations instead of the low calculation volume theoretical indication (BFLOP).
Steady-state genetics selected the optimum hyperparameters. YOLOv3-SPP trained the genetic algorithm
with generalized intersection over union (GioU) loss to find min-val 5k sets for 300 epochs. They computed
43.5% AP (65.0% AP50) on the Tesla V100 at 65 FPS. Zhao et al. [6] limited example-based and block-
based trimming to a wide variety of convolutional (CONV) and Fully-Connected (FC) layers. Pruning
algorithms helped them investigate. They evaluated YOLOv4 on the COCO dataset for 2D object
recognition. They considered 3D identification on the KITTI dataset using Point Pillars. Experiments showed
that the suggested method consistently achieves 55ms guessing times for YOLOv4-based 2D item discovery
and 99ms for Point Pillars-based 3D identification on a commercially available mobile device, with only
slightly decreased accuracy. The DL object finder with micro fluidic image-activated droplet sorting (DL-
IADS) was suggested by Howell et al. [7] to execute the adaptable, name-free game plan, counting, and
limitation of diverse smaller-than-anticipated articles at high throughput. YOLOv4-small was used with good
accuracy and speed for multi-class counting of cells, cell totals, and polyacrylamide (PA) globules.
To achieve high detection accuracy, Liu et al. [8] compared three cutting-edge object recognition
techniques: RetinaNet, fully convolutional one-stage object detection (FCOS), YOLOv3, and YOLOv4. They
handled YOLOv4's shortcut layer and convolutional channel to create thinner and shallower models. Parico
and Ahamed [9] used the RGB dataset. They built the real time pear fruit detection model to increase
accuracy given time, equipment, and dataset size, and to examine the determination speed of the YOLOv4
family and identify which one has allowance speed near to advancing (>24 FPS). Since YOLOv4 had a low
misleading negative rate when differentiating pear organic items, summing with deep simple online realtime
tracking (SORT), the exceptional ID was not established to be more reliable, with an F1 count of 87.85%.
Kumari et al. [10] introduced the mobile eye-tracking model to distribute flexible eye-following data to
accurate items using object recognition algorithms. They showed that combining YOLOv4 with an optical
stream evaluation yields the fastest outcomes with the highest accuracy of 90% for object recognition. It
allows continual framework replies to the client's look and reduces portable eye-following data review time.
Chen et al. [11] used pictures to tackle the scale pest problem using an AI-based pest-finding
framework. Object recognition methods were used to analyze the data. The adaptive image scaling approach
effectively minimizes computation and redundancy, and the updated CSPDarkNet53 used in the Yu and
Zang [12] studied trunk FE network reduces the network's processing expenses while increasing the model's
learning capacity. According to the studies, the face mask identification method has a mAP of 98.3% and a
frame rate of 54.57 FPS, quicker than the current approach. Region-based fully convolutional network (R-
FCN), mask region-based convolutional neural network (R-CNN), a single-shot detector (SSD), RetinaNet,
and YOLOv4 were examined by Haris and Glowacz [13]. They used the Berkeley deep drive 100K dataset
for these comparisons. Their strengths and limitations are assessed by accuracy, computation time, and
precision-recall curve. YOLOv4 outperformed in recognizing challenging road target objects under various
road settings and weather conditions. Babu et al. [14] worked on identifying the facial expression using
bezier curves, and Image segmentation based on scanned document detection was done by using neural
network (NN) [15]. Shankar et al. [16] developed a tool to remove noise images on portable gray map (PGM)
and designed a framework using the YOLO model [17]. Shankar et al. [18] used a noise reduction filter using
social group optimization (SGO).
Roy et al. [19] suggested a high-accuracy single-stage object placement model that turns item
identification into a relapse problem by creating jumping box arrangements and assigning class confidence.
They also developed a high-performing continuous fine-grain object identification framework to overcome
several plant sickness localization obstacles that prevent conventional methods from working.
The Microsoft common objects in context (MS COCO) dataset were utilized by Cai et al. [20] as
their train and test dataset. They use pruning methodologies for inference to achieve realtime mobile object
detection. Specifically, a regularisation pruning technique was applied. The results showed that the YOLOv4
model is 92.4% accurate. Guo et al. [21] suggested YOLOv4-tiny to differentiate electronic components and
present the model on an electronic part dataset for approval. Electronic components are tiny, hard to discern,
and move on a transit line, making objective discovery harder. Compared to faster RCNN, SSD, RefneDet,
EfcientDet, and YOLOv4, YOLOv4-tiny has the highest location accuracy and quickest speed and may be
used to build electronics industry assembly robots. The initial calculation's accuracy increased from 93.74 to
98.6% based on trial data. Liu et al. [22] conducted several removal tests for the updated YOLO v4 in
SeaShips and SeaBuoys. They developed a residual depth-wise separable convolution (RDSC) model and
applied it to the YOLO v4 spine and component combination organizations. The improved YOLO v4 had a
25% increase in identification speed, 1.78% in mAP %, and 0.95% in the two information arrangements.
The 2010 and 2012 ImageNet large scale visual recognition challenge (ILSVRC) subsets were
generated by Krizhevsky [23]. They created one of the biggest CNNs using these datasets. Two new, massive
datasets are LabelMe, with hundreds of thousands of well-segmented photographs, and ImageNet, with over
15 million marked high-resolution photos in 22,000 categories. They won the ILSVRC-2012 competition
with a 15.3% test mistake rate, compared to 26.2% for the second-best passage, using a variant of their
algorithm. Fast YOLO, a smaller variant of the organization presented by Redmon et al. [24], examines 155
frames per second (FPS) and doubles detector mAP. Convolutional layers are trained using ImageNet 1,000-
class competition data. According to the data, YOLO learns extensive representations with 92.83% accuracy.
A region proposal network (RPN) by Ren et al. [25] communicates full-picture convolutional properties with
the recognition organization for almost-free location identification. Region proposal networks (RPNs) predict
local object limitations and objectness scores. The GPU-detected VGG-16 model runs at 5FPS. Pattern
analysis, statistical modelling, and computational learning (PASCAL) achieved 5 FPS (all stages) on a GPU
with state-of-the-art object detection accuracy. The light detection and ranging (LiDAR) sensor were
employed in the realtime object detection model by Fan et al. [26], which can offer 360° ambient depth
information with a detection range of 120 meters. Datasets from PASCAL and KITTI visual object classes
(VOC) were used. KITTI provides inside-out information for LiDAR segmentation (LS) of objects from
You only look once model-based object identification in computer vision (Shiva Shankar Reddy)
830 ISSN: 2252-8938
LiDAR point clouds, while PASCAL VOC was used to train the YOLOv4 neural network for object
identification. They found 91.27% accuracy in Lidar segmentation.
Ganesh et al. [27] suggested a realtime object detection on edge GPU model that improves accuracy
and execution performance on edge GPU devices. YOLO-ReT with MobileNetV 20.75% backbone runs at
3.05 FPS on Jetson Nano and scores 68.75 mean average precision (mAP) on Pascal VOC and 34.91 mAP on
COCO, outperforming its competitors by 3.05 and 0.91 mAP, respectively. They also introduced a multi-
scale inclusion cooperation module in YOLOv4-tiny, which improved their presentation by 1.3 and 0.9 mAP
on COCO. Li et al. [28] suggested a calibrated part affinity fields technique to evaluate pedestrian posture
based on YOLOv4 structure. Explainable artificial intelligence (XAI) was employed in the risk assessment
phase to interpret and estimate results. YOLOv4's total parameters were decreased by 74%, indicating it can
run in real-time. Li et al. [29] worked on forward location prediction using a Siamese network to reduce false
positives from noisy detections, whereas reverse forecast check reduces false positives from forward
expectation. The remaining tracks are identified and have future expectation certainty via weighted
consolidation. Results showed that the suggested technique beats the state-of-the-art on the UA-DETRAC
vehicle in the following dataset and maintains continuous processing at 20.1 FPS. Li et al. [30] proposed
YOffleNet. This additional object detection model limits accuracy loss while compressing information
rapidly for ongoing and safe driving applications on autonomous cars. Using the KITTI dataset as a test bed,
experiments revealed that the proposed YOffleNet is 4.7 times more compressed than the YOLOv4-s, which
could produce 46 FPS using a coordinated graphics processing unit (GPU) system (NVIDIA Jetson AGX
Xavier). To 85.8% mAP, which is just 2.6% less accurate than YOLOv4, the accuracy is considerably
reduced compared to the high compression percentage. Thus, the suggested network can reliably identify
objects on an autonomous system's implanted system.
Gao et al. [31] added channel attention mechanism to the YOLOv4 algorithm and created an object
recognition method with channel attention mechanism to improve visual feature representation. The module
initially performed global average pooling on the features recovered by YOLOv4, then performed local
cross-channel interaction operation on the feature channels using one-dimensional convolution to increase the
correlation between channel features to improve placement accuracy. Guo et al. [32] developed a deep
learning (YOLO model) based, real-time object recognition system for mixed reality devices. Using the
YOLO paradigm, they presented a HoloLens-Ubuntu real-time communication system for object
identification. The experiment results indicated that HoloLens realtime object identification using the
suggested model is quick and accurate at 92.8%. They believe it makes Microsoft HoloLens a robot vision
device and improves human-robot collaboration. To organize 24 geo-referenced RGB images on an 8-ha
grape plantation and to determine the number of packs, Sozzi et al. [33] suggested that the Grape yield spatial
inconstancy model was employed. This has been done in light of several target images (320-1,280 pixels) and
varied certainty edges (0.25-0.35). Subsequently, the number of packs that were detected was compared to
the actual number, together with the total weight obtained from the plants that were the subject of the
collected images.
3. METHODOLOGY
Digitally detecting semantic entities like people, buildings, automobiles, and animals in images and
films is called object detection. It involves image processing and computer vision. YOLOv4, a state-of-the-
art (SOTA) realtime object detection model, is used. YOLOv4 is the fourth YOLO game. It performed SOTA
on the 80-category common objects in context (COCO) dataset. The YOLOv4 detector is single-stage. One-
stage object detection prioritizes inference speeds. One-stage detector models predict picture classes and
bounding boxes but not ROIs. Thus, they are quicker than detectors with two stages.
3.1. Dataset
The data was collected from the Microsoft-published MS COCO dataset, an enormous-scale object
detection, segmentation, and inscribing dataset. AI and PC vision researchers generally use the COCO
dataset for some PC vision projects. The YOLO model applied to these datasets achieved the objectives of
the work.
3.2. Objectives
To fulfill this gap, the objectives of the Computer Vision Enhancement using YOLOv4 are to Count
the total number of objects in the image, Count the things per Class in the image, and finally. Crop the
detected objects and save them as a new idea in a new folder. The user can easily identify the objects from
the images using these objectives.
3.3.1. Input
The first Input is just our collection of training photographs, which will be taken care of into the
network in batches and processed by the GPU. The Input is given primarily in the Yolov4 technique, and the
entire flow is shown in Figure 4. The provided Input can be of any form, such as images, videos, patches, and
image pyramids.
You only look once model-based object identification in computer vision (Shiva Shankar Reddy)
832 ISSN: 2252-8938
3.3.3. Neck
Features converge near the neck. It compiles feature maps from different backbone stages and
merges them to prepare them for the next phase. The channel has several top-down and bottom-up routes.
spatial pyramid pooling (SPP) is added between the feature aggregator network and the PANet backbone. It
improves the receptive field and filters out essential context items without affecting network performance. It
links to the highly CSPDarkNet's last convolutional layers. Only one kernel or filter is applied to an image's
receptive field. When we develop dilated convolutions, it rises exponentially, causing non-linearity. A
modified route aggregation network is utilized to make YOLOv4 more suitable for single GPU training, as
illustrated in Figure 5. The path aggregation network's (PANet's) primary function is to increase the
segmentation efficiency by preserving space data, which aids in appropriate pixel localization for mask
prediction. The main qualities that make them precise for ask prediction are path augmentation from the
bottom up, adaptive feature pooling, and fully connected fusion.
3.3.4. Head
The head's primary task in YOLOv4 is prediction, which comprises classification and regression of
bounding boxes. The primary goal of this software is to find bounding boxes and categorize them. The
bounding box coordinates (x, y, height, width) and the scores are recognized. The b-center box's x and y
coordinates are at the grid cell's border. The width and size of the image are computed to the whole. A
YOLOv4 head can be installed in any anchor box. As shown in Figure 6, anchor boxes hold many objects of
varied sizes in a single frame with the center in the same cell. In contrast to the preceding illustration, a grid
was utilized to recognize a single object in a frame.
You only look once model-based object identification in computer vision (Shiva Shankar Reddy)
834 ISSN: 2252-8938
This is done for each bounding box separately. For each bounding box, acquire a numerical result
used as the confidence score. They get two such outcomes for the two bounding boxes per grid square that
they employed in their experiment. That output corresponds to two terms on the left-hand side of the equation
above. Then, they multiply by the conditional chance that a grid square includes a specific class if it contains
an object. This yields a confidence score for each frame and class.
4. RESULTS
The system detects the objects from the image according to their class and displays it on the output
image at the top left corner, as shown in Figure 11. Here, in Figure 11(a), the images were taken as Input, and
in Figure 11(b) it will display the image class in the left corner. Here we consider Blur images to track the
objects. Here, we have shown three different images to track the things.
The system detects the total number of objects in the input image and displays it on the output image
at the top left corner, as shown in Figure 12. In Figure 12(a), the images were taken as Input. In Figure 12(b),
they were cropped and saved as new images in the given folder if observed that Figure 11 images are
detected from per class, and Figure 12 images are the total no of object.
(a)
(b)
Figure 11. Results of the object detected per class as given in (a) input images and (b) output images
(a)
(b)
Figure 12. Results for objects detected as given in (a) input images and (b) output images
You only look once model-based object identification in computer vision (Shiva Shankar Reddy)
836 ISSN: 2252-8938
After the results of Figure 12, the images were considered cropped. The detected objects are cropped
and saved as new images with their class name as their image name in a new folder called crop, as shown in
Figure 13. In Figure 13(a), the images were considered input images, and Figure 13(b) shows the images after
cropping. Here, we have viewed some blurred images as Input and cropped them from them. After cropping,
the images are considered as output, as shown in Figure 13. The custom functions are built on top of the
YOLOv4 framework. The output includes the number of items identified, as well as a bounding box around
each object. The confidence score indicates the likelihood that the detected object belongs to the predicted
class according to YOLOv4. A confidence threshold is implemented to eliminate the detections with low
confidence.
(a) (b)
Figure 13. Results after cropping the images as given in (a) input images and from that and
(b) output images
5. CONCLUSION
This research was effective in determining the presence of the things that are visible in the picture.
In general, YOLOv4 is an advanced object identification model that can identify the many things that may be
seen in a picture. We built custom procedures to count items per Class, crop the picture, and store it in a
different folder since YOLOv4's information is not fully used. With the help of these custom functions, we
are able to do an analysis of the data more quickly. In addition to recognizing visual objects, a confidence
score is produced to provide the user a likelihood. The monitoring industry may find this technology useful.
We are able to assert that the newly installed system likewise has the same degree of precision. The graph
above and result analysis show that YOLOv4 is more accurate than YOLOv3 and other real-time object
identification methods.
ACKNOWLEDGEMENTS
The authors have done their work individually and declared no conflicts of interest.
REFERENCES
[1] Y. Yang et al., "Realtime detection of aircraft objects in remote sensing images based on improved YOLOv4". In 2021 IEEE 5th
Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) 2021 Mar 12 (Vol. 5, pp. 1156-
1164). IEEE.DOI: 10.1109/IAEAC50856.2021.9390673.
[2] A. Kumar et al., "A hybrid tiny YOLO v4-SPP module based improved face mask detection vision system. Journal of Ambient
Intelligence and Humanized Computing". 2021 Oct 20:1-4. DOI: https://doi.org/10.1007/s12652-021-03541-x.
[3] R. Wang et al., "A Realtime Object Detector for Autonomous Vehicles Based on YOLOv4". Computational Intelligence and
Neuroscience. 2021 Dec 10;2021. DOI:https://doi.org/10.1155/2021/9218137.
[4] J. Lee et al., "TOD: Transprecise object detection to maximise realtime accuracy on the edge". In2021 IEEE 5th International
Conference on Fog and Edge Computing (ICFEC) 2021 May 10 (pp. 53-60). IEEE. DOI: 10.1109/ICFEC51620.2021.00015.
[5] A. Bochkovskiy et al., "Yolov4: Optimal speed and accuracy of object detection". arXiv preprint arXiv:2004.10934. 2020 Apr 23.
DOI: https://doi.org/10.48550/arXiv.2004.10934.
[6] P. Zhao et al.,"Neural Pruning Search for Realtime Object Detection of Autonomous Vehicles". In2021 58th ACM/IEEE Design
Automation Conference (DAC) 2021 Dec 5 (pp. 835-840). IEEE. DOI: 10.1109/DAC18074.2021.9586163.
[7] L.Howell et al., "Multi‐Object Detector YOLOv4‐Tiny Enables High‐Throughput Combinatorial and Spatially‐Resolved Sorting
of Cells in Microdroplets". Advanced Materials Technologies. 2022 May;7(5):2101053. DOI:
https://doi.org/10.1002/admt.202101053.
[8] H. Liu et al., "Realtime small drones detection based on pruned yolov4. Sensors". 2021 May 12;21(10):3374. DOI:
https://doi.org/10.3390/s21103374.
[9] AI. Parico AI and T. Ahamed, "Real time pear fruit detection and counting using YOLOv4 models and deep SORT. Sensors".
2021 Jul 14;21(14):4803. DOI: https://doi.org/10.3390/s21144803.
[10] N. Kumari et al., 'Mobile Eye-Tracking Data Analysis Using Object Detection via YOLO v4. Sensors". 2021 Nov
18;21(22):7668. DOI: https://doi.org/10.3390/s21227668.
[11] JW. Chen et al.,"A smartphone-based application for scale pest detection using multiple-object detection methods". Electronics.
2021 Feb 3;10(4):372. DOI: https://doi.org/10.3390/electronics10040372.
[12] J. Yu and W. Zhang, " Face mask wearing detection algorithm based on improved YOLO-v4. Sensors". 2021 Jan;21(9):3263.
DOI: https://doi.org/10.3390/s21093263.
[13] M. Haris and A. Glowacz, "Road object detection: A comparative study of deep learning-based algorithms". Electronics. 2021
Aug 11;10(16):1932. DOI: https://doi.org/10.3390/electronics10161932.
[14] D. R. Babu et al., "Facial expression recognition using bezier curves with hausdorff distance". In2017 International Conference on
IoT and Application (ICIOT) 2017 May 19 (pp. 1-8). IEEE. DOI: 10.1109/ICIOTA.2017.8073622.
[15] R. B. Devareddi et al.,"Image segmentation based on scanned document and hand script counterfeit detection using neural
network". InAIP Conference Proceedings 2022 Dec 9 (Vol. 2576, No. 1, p. 050001). AIP Publishing LLC. DOI:
https://doi.org/10.1063/5.0105808.
[16] R. S. Shankar et al.,"Object oriented fuzzy filter for noise reduction of Pgm images". In2012 8th International Conference on
Information Science and Digital Content Technology (ICIDT2012) 2012 Jun 26 (Vol. 3, pp. 776-782). IEEE.
[17] R. S. Shankar et al.,"A Framework to Enhance Object Detection Performance by using YOLO Algorithm". In2022 International
Conference on Sustainable Computing and Data Communication Systems (ICSCDS) 2022 Apr 7 (pp. 1591-1600). IEEE. DOI:
10.1109/ICSCDS53736.2022.9760859.
[18] V. M. Gupta et al., "A Novel Approach for Image Denoising and Performance Analysis using SGO and APSO". InJournal of
Physics: Conference Series 2021 Nov 1 (Vol. 2070, No. 1, p. 012139). IOP Publishing. DOI: 10.1088/1742-6596/2070/1/012139.
[19] A. M. Roy et al., "A fast accurate fine-grain object detection model based on YOLOv4 deep neural network". Neural Computing
and Applications. 2022 Mar;34(5):3895-921. DOI: https://doi.org/10.1007/s00521-021-06651-x.
[20] Y. Cai et al., "Yolobile: Realtime object detection on mobile devices via compression-compilation co-design". InProceedings of
the AAAI Conference on Artificial Intelligence 2021 May 18 (Vol. 35, No. 2, pp. 955-963).
DOI:https://doi.org/10.1609/aaai.v35i2.16179.
[21] C. Guo et al., "Improved YOLOv4-tiny network for realtime electronic component detection". Scientific Reports. 2021 Nov
23;11(1):1-3. DOI: https://doi.org/10.1038/s41598-021-02225-y.
[22] T. Liu et al., "Sea Surface Object Detection Algorithm Based on YOLO v4 Fused with Reverse Depthwise Separable Convolution
(RDSC) for USV". Journal of Marine Science and Engineering. 2021 Jul 7;9(7):753. DOI: https://doi.org/10.3390/jmse9070753.
[23] A. Krizhevsky et al., "Imagenet classification with deep convolutional neural networks". Communications of the ACM. 2017 May
24;60(6):84-90.
[24] J. Redmon et al., "You only look once: Unified, realtime object detection". InProceedings of the IEEE conference on computer
vision and pattern recognition 2016 (pp. 779-788).
[25] S. Ren et al., "J. Faster r-cnn: Towards realtime object detection with region proposal networks". Advances in neural information
processing systems. 2015;28.
[26] Y. C. Fan et al.,"Realtime Object Detection for LiDAR Based on LS-R-YOLOv4 Neural Network". Journal of Sensors. 2021 May
26;2021. DOI: https://doi.org/10.1155/2021/5576262.
[27] P. Ganesh et al., "YOLO-ReT: Towards high accuracy realtime object detection on edge GPUs". InProceedings of the IEEE/CVF
Winter Conference on Applications of Computer Vision 2022 (pp. 3267-3277).
[28] Y. Li et al., "A deep learning-based hybrid framework for object detection and recognition in autonomous driving". IEEE Access.
2020 Oct 23; 8:194228-39. DOI: 10.1109/ACCESS.2020.3033289.
You only look once model-based object identification in computer vision (Shiva Shankar Reddy)
838 ISSN: 2252-8938
[29] Y. Li et al., "Crop pest recognition in natural scenes using convolutional neural networks". Computers and Electronics in
Agriculture. 2020 Feb 1;169: 105174.DOI: https://doi.org/10.1016/j.compag.2019.105174.
[30] I. Sim et al., "Developing a Compressed Object Detection Model based on YOLOv4 for Deployment on Embedded GPU Platform
of Autonomous System". arXiv preprint arXiv:2108.00392. 2021 Aug 1.DOI: https://doi.org/10.48550/arXiv.2108.00392.
[31] C. Gao,"YOLOv4 object detection algorithm with efficient channel attention mechanism". In2020 5th International Conference
on Mechanical, Control and Computer Engineering (ICMCCE) 2020 Dec 25 (pp. 1764-1770). IEEE. DOI:
10.1109/ICMCCE51767.2020.00387.
[32] J. Guo et al., "Realtime Object Detection with Deep Learning for Robot Vision on Mixed Reality Device". In2021 IEEE 3rd
Global Conference on Life Sciences and Technologies (LifeTech) 2021 Mar 9 (pp. 82-83). IEEE. DOI:
10.1109/LifeTech52111.2021.9391811.
[33] M. Sozzi et al., "Grape yield spatial variability assessment using YOLOv4 object detection algorithm". Proceedings of the
Precision Agriculture '21, ECPA. 2021 Jul 19.
BIOGRAPHIES OF AUTHORS