0% found this document useful (0 votes)

19 views15 pages

Edge-YOLO Lightweight Infrared Object Detection Method Deployed On Edge Devices

Uploaded by

yanguangsun792

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views15 pages

Edge-YOLO Lightweight Infrared Object Detection Method Deployed On Edge Devices

Uploaded by

yanguangsun792

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

applied

sciences
Article
Edge-YOLO: Lightweight Infrared Object Detection Method
Deployed on Edge Devices
Junqing Li and Jiongyao Ye *

School of Information Science and Engineering, East China University of Science and Technology,
Shanghai 200237, China; [email protected]
* Correspondence: [email protected]

Abstract: Existing target detection algorithms for infrared road scenes are often computationally
intensive and require large models, which makes them unsuitable for deployment on edge devices.
In this paper, we propose a lightweight infrared target detection method, called Edge-YOLO, to
address these challenges. Our approach replaces the backbone network of the YOLOv5m model
with a lightweight ShuffleBlock and a strip depthwise convolutional attention module. We also
applied CAU-Lite as the up-sampling operator and EX-IoU as the bounding box loss function. Our
experiments demonstrate that, compared with YOLOv5m, Edge-YOLO is 70.3% less computationally
intensive, 71.6% smaller in model size, and 44.4% faster in detection speed, while maintaining the
same level of detection accuracy. As a result, our method is better suited for deployment on embedded
platforms, making effective infrared target detection in real-world scenarios possible.

Keywords: infrared object detection; lightweight network; convolutional attention; YOLOv5; RK3588

1. Introduction
Visible light images are commonly used in target detection due to their high resolution,
definition, and detailed visual information that is easily interpreted by the human eye.
However, these images are sensitive to external factors such as weather and lighting con-
ditions, which can reduce image quality and negatively impact target detection accuracy.
Citation: Li, J.; Ye, J. Edge-YOLO: This is where infrared imaging technology plays a crucial role. By overcoming these limita-
Lightweight Infrared Object tions, infrared imaging allows for image acquisition under diverse lighting and weather
Detection Method Deployed on Edge conditions, including foggy days and nighttime scenarios. As a result, this technology has
Devices. Appl. Sci. 2023, 13, 4402. been widely adopted in various fields, such as autonomous driving, security monitoring,
https://doi.org/10.3390/ and remote sensing, and offers a broader range of use cases than visible images. Therefore,
app13074402 the importance of infrared imaging technology in enabling efficient and reliable target
Academic Editor: Dimitris Mourtzis
detection cannot be overstated.
Traditional infrared target detection techniques [1–3] are mainly model-based methods,
Received: 26 February 2023 such as template matching, threshold segmentation, and the Hausdorff metric. However,
Revised: 20 March 2023 with the development of deep learning, target detection techniques based on convolutional
Accepted: 28 March 2023 neural networks have emerged in recent years. These methods are primarily divided into
Published: 30 March 2023
two-stage algorithms (e.g., Faster-RCNN [4]) and single-stage algorithms (e.g., SSD [5] and
YOLO [6]). The single-stage algorithm is designed to achieve a balance between detection
speed and accuracy, resulting in a significant improvement in detection speed while main-
Copyright: © 2023 by the authors.
taining accuracy compared with the two-stage algorithm. Consequently, the YOLO series
Licensee MDPI, Basel, Switzerland. has become a widely used representative of the single-stage algorithm. Among the main-
This article is an open access article stream YOLO algorithms, YOLOv5 (version 6.2) has achieved significant improvement in
distributed under the terms and both detection accuracy and speed compared with its predecessor by using Mosaic data en-
conditions of the Creative Commons hancement, C3 modules, and an improved SPPF module. Although YOLOv5 performs well
Attribution (CC BY) license (https:// on visible images, it encounters several challenges when applied to infrared detection. The
creativecommons.org/licenses/by/ first major issue is that infrared images suffer from poor contrast, high noise, and blurred
4.0/). imaging, leading to the loss of crucial target features during deep convolutional network

Appl. Sci. 2023, 13, 4402. https://doi.org/10.3390/app13074402 https://www.mdpi.com/journal/applsci

Appl. Sci. 2023, 13, 4402 2 of 15

processing. Moreover, since infrared images lack color information, and the difference
between target and background features is minimal, it becomes challenging for deep convo-
lutional neural networks to distinguish useful information from irrelevant data, reducing
detection accuracy. Another significant challenge is that embedded devices commonly
used in autonomous driving and security monitoring fields have limited computing power,
storage space, and power consumption, which makes deploying large target detection
models such as YOLOv5 difficult. Additionally, these fields require real-time detection,
and collecting data on edge devices and sending them to the server for detection and
analysis can lead to network latency and communication congestion problems in widely
distributed areas.
The challenges mentioned above necessitate lightweight infrared target detection on
edge devices. In summary, this paper proposes a solution to these issues by introducing
Edge-YOLO, an infrared target detection algorithm that utilizes lightweight networks and
attention mechanisms specifically designed for edge devices. The primary enhancements
of this algorithm can be summarized as follows:
(1) The bounding box loss function was redesigned, and a loss function with a power
hyperparameter α was made to accelerate the convergence of the loss function and
solve the uncertainty of the aspect ratio in CIoU;
(2) A lightweight content-aware up-sampling operator was adopted, which can obtain a
larger perceptual field than the original nearest-neighbor up-sampling method, while
only introducing a small number of parameters and computational cost;
(3) The feature extraction network was reconstructed based on the improved Shuf-
fleNetv2, which enhances the extraction ability of strip features in IR scenes and
the perception ability of salient features in IR images by embedding a newly designed
strip depthwise convolutional attention module in ShuffleBlock while significantly
reducing the computational power of the network.
The remaining sections of this paper are organized as follows: Section 2 provides an
overview of related works on target detection with neural networks, including YOLOv5,
and other algorithms in infrared target detection. Section 3 provides a detailed introduction
to the Edge-YOLO algorithm proposed in this paper. Section 4 presents the results of various
experiments conducted to evaluate the performance of Edge-YOLO. Finally, Section 5
summarizes the main contributions of this paper.

2. Related Works
The YOLO family of algorithms, known for their efficiency and simplicity, was first
introduced by Redmon et al. in 2015. In the years that followed, Redmon et al. released
YOLOv2 and YOLOv3 algorithms, which further reduced network complexity and im-
proved detection speed compared with two-stage algorithms [7]. After Redmon withdrew
from the field of computer vision, Glenn Jocher released YOLOv5 in 2020, which has since
been updated to version 6.2. YOLOv5 is composed of a backbone feature extraction module,
a neck feature fusion module, and a head detection module, as illustrated in Figure 1. The
algorithm incorporates five different scales of n, s, m, l, and x, with larger scales delivering
higher detection accuracy but slower real-time performance. However, the network struc-
ture of the models of different scales remains consistent, differing only in the number of
partial layers, and is represented uniformly as “×n” in the figure. YOLOv5 uses CSPDark-
net53 as its backbone network, which includes the Cross Stage Partial (CSP) structure [8].
The CSP structure integrates gradient changes in the feature map, reducing the problem
of repeating gradient information in the backbone network. Moreover, YOLOv5 utilizes a
bottleneck structure with residual connections in the backbone network to prevent network
degradation due to gradient disappearance, and a bottleneck structure without residual
connections in the feature fusion layer to reduce computational effort. Additionally, Jocher
employs a modified Spatial Pooling Pyramid Fast (SPPF) structure in place of the original
SPP [9]. The modified SPPF achieves the same computational results as the original parallel
Appl. Sci. 2023, 13, x FOR PEER REVIEW 3 of 17

Appl. Sci. 2023, 13, 4402 3 of 15

effort. Additionally, Jocher employs a modified Spatial Pooling Pyramid Fast (SPPF) struc-
ture in place of the original SPP [9]. The modified SPPF achieves the same computational
results
MaxPool as layers
the original parallel
of three MaxPool
different layers
sizes by of threemultiple
serializing differentMaxPool
sizes by layers
serializing
of themulti-
same
size,MaxPool
ple significantly
layersreducing computational
of the same time. reducing computational time.
size, significantly

Figure
Figure 1.
1. The
The network
network structure
structure of
of YOLOv5
YOLOv5 series
series (including
(includingn,
n, s,s, m,
m, l,l, and
and x,
x, depending
depending on
on the
the
number
number ofof duplicates
duplicates of
of module
module C3).
C3).

In addition
In addition to to YOLOv5,
YOLOv5, variousvarious target
target detection
detection methods
methods have have been
been proposed
proposed for for
infrared scenes
infrared scenes byby researchers.
researchers. Li Li et
et al.
al. [10]
[10] proposed
proposed the the YOLO-FIRI
YOLO-FIRI model, model, an an infrared
infrared
image area-free
image area-free target
target detector
detector based
based on YOLOv5. They They achieved
achieved good good infrared
infrared target
target
detection performance by
detection by improving
improvingthe theCSPCSPstructure
structureand andintroducing
introducing multiple
multiple detection
detec-
heads.
tion Fan Fan
heads. et al.et[11]
al. improved
[11] improved the feature extraction
the feature capability
extraction by using
capability dense dense
by using connection
con-
blocks based on YOLOv5, and improved the detection
nection blocks based on YOLOv5, and improved the detection accuracy by adding accuracy by adding a channel
a chan-
focus
nel mechanism
focus mechanism and and
modifying
modifyingthe loss
the function. Dai etDai
loss function. al. et
[12]al.proposed TIRNet,
[12] proposed which
TIRNet,
adopted
which VGG asVGG
adopted the feature extractor
as the feature and used
extractor anda continuous information
used a continuous fusion strategy
information fusion
to obtaintomore
strategy obtainaccurate and smoother
more accurate detection
and smoother results.results.
detection Li et al.Li[13] designed
et al. a dense
[13] designed a
nested interactive module to achieve progressive interaction among
dense nested interactive module to achieve progressive interaction among high-level and high-level and low-
level features.
low-level YouYou
features. et al.et[14] utilized
al. [14] multiscale
utilized mosaic
multiscale datadata
mosaic augmentation
augmentation to enhance
to enhance the
diversity of objects and proposed a parameter-free attention mechanism
the diversity of objects and proposed a parameter-free attention mechanism to enhance to enhance features.
AlthoughAlthough
features. these methods
these can be applied
methods can betoapplied
IR targettodetection,
IR target they have some
detection, drawbacks.
they have some
For instance,
drawbacks. striped
For instance,targets in IR
striped road scenes
targets require
in IR road scenesa more
requirereasonable combination
a more reasonable com- of
striped convolution and traditional convolution to extract features.
bination of striped convolution and traditional convolution to extract features. The bound-The bounding box loss
function
ing of the
box loss algorithm
function of the needs to be more
algorithm needsaccurately
to be more adapted to the
accurately boundary
adapted regression
to the bound-
ary regression of targets in IR images, and the model needs to be more lightweight to for
of targets in IR images, and the model needs to be more lightweight to be suitable be
practicalfor
suitable edge devices.
practical Therefore,
edge devices.the method the
Therefore, in this paperinfocuses
method on improving
this paper focuses onthe above
improv-
shortcomings.
ing the above shortcomings.
3. Methods
3. Methods
The structure of our Edge-YOLO model is shown in Figure 2 below.
The structure
Firstly, of our Edge-YOLO
the backbone of the modelmodel is shown
uses the in Figure
improved 2 below.to replace the C3
ShuffleBlock
module in YOLOv5 to enhance the feature extraction capability for the characteristics of IR
images of road scenes while reducing the complexity of the model.
Sci. 2023, 13, x FOR PEER REVIEW 4 of 17

Appl. Sci. 2023, 13, 4402 Firstly, the backbone of the model uses the improved ShuffleBlock to replace the C3 4 of 15
module in YOLOv5 to enhance the feature extraction capability for the characteristics of
IR images of road scenes while reducing the complexity of the model.
Secondly, inSecondly,
the featurein up-sampling structure, the
the feature up-sampling original the
structure, nearest-neighbor up-
original nearest-neighbor up-
sampling operator
samplingis replaced
operatorby the improved
is replaced by theCAU-Lite
improved module.
CAU-Lite module.
Thirdly, although
Thirdly,notalthough
shown in theshown
not figure,inwe
theutilize
figure,the
werecently proposed
utilize the recentlyEX-IoU
proposed EX-IoU
instead
instead of CIoU of CIoU
as the as thebox
bounding bounding box loss
loss function function
of our of This
model. our model.
new lossThis new loss function
function
provides
provides better and morebetter and more
accurate accurate convergence
convergence during
during training, thustraining,
leadingthus leading to improved
to improved
detection
detection performance. performance.

Figure 2. The Figure

network2. structure of Edge-YOLO.
The network structure of Edge-YOLO.

3.1. Improved3.1. Improved

Bounding BoxBounding Box Loss
Loss Function Function EX-IoU
EX-IoU
The latest
The latest version of version of the YOLOv5
the YOLOv5 algorithmalgorithm
(version(version
6.2) uses 6.2)the
uses the Complete-IoU (CIoU)
Complete-IoU
(CIoU) as itsas its bounding
bounding boxfunction,
box loss loss function, as proposed
as proposed by Zhengby Zheng
[15]. The [15]. The integrates
CIoU CIoU integrates three
three aspectsaspects
of theof the intersection–union
intersection–union ratio ratio
(IoU)(IoU) between
between the predicted
the predicted box box
and and
the the ground
truth box: the ratio of the distance between their centroids to
ground truth box: the ratio of the distance between their centroids to the length of the the length of the diagonal of
the minimum outer rectangle, and the similarity of their aspect
diagonal of the minimum outer rectangle, and the similarity of their aspect ratios. The ratios. The equation for the
CIoU is as follows:
equation for the CIoU is as follows: 
ρ2 (b,b gt )
 LCIoU = 1 − IoU + + αv


 c2
v
α = (1− IoU )+v (1)
 gt
2
4 w w

 v = 2 arctan gt − arctan

π h h

where α is the weight coefficient, ρ b, b gt is the distance between the center point of the

predicted box and the ground truth box, c is the diagonal length of the minimum outer
rectangle, v indicates the difference between the aspect ratio of the predicted box and the
ground truth box, and v is 0 if they have the same aspect ratio.
Appl. Sci. 2023, 13, 4402 5 of 15

The CIoU metric used in the current version of the YOLOv5 algorithm is designed to
integrate three aspects of the intersection–union ratio IoU between the prediction box and
the ground truth box. However, the aspect ratio used in CIoU is a relative value, which
can introduce uncertainty during calculation and potentially hinder the optimization of the
model. To address this issue, we propose the use of Efficient-IoU (EIoU) as the bounding
box loss function, as proposed by Zhang et al. [16]. EIoU splits the aspect ratio based
on CIoU and replaces the original aspect ratio difference between the predicted box and
ground truth box with the ratio of the width difference between the predicted box and the
ground truth box to the width of the minimum circumscribed rectangle, and the ratio of
the height difference to the height of the minimum outer rectangle. This approach leads to
a more accurate bounding box loss function and facilitates better model optimization. The
formula is as follows:

ρ2 b, b gt ρ2 w, w gt ρ2 h, h gt

L EIoU = 1 − IoU + + + (2)
c2 Cw2 Ch2

where Cw and Ch denote the width and height of the minimum outer rectangle, respectively.
The literature [17] proposes to use the hyperparameter α as a power on each term in the
loss function IoU, with the following simplified formula.

Lα− IoU = 1 − IoU α (3)

The parameter α is crucial in emphasizing the importance of the loss and gradient
of objects with high IoU, thereby enhancing the accuracy of the bounding box regression.
To improve the bounding box loss function, this paper incorporates the power α into the
EIoU equation, resulting in a new function known as EX-IoU. This function exponentially
magnifies the importance of the IoU value, centroid distance, width difference, or height
difference between any predicted box and the ground truth box, leading to an exponential
reduction effect on losses and an improvement in the accuracy of the bounding box regres-
sion. The optimal value of α is determined through experiments discussed in Section 4.

ρ2α b, b gt ρ2α w, w gt ρ2α h, h gt

L EX − IoU = 1 − IoU +
α
+ + (4)
c2α 2α
Cw Ch2α

3.2. Content-Aware Lightweight Up-Sampling Operator CAU-Lite

Up-sampling is a crucial operation in the Feature Pyramid Network (FPN [18]), and
the two commonly used methods for up-sampling are linear interpolation and deconvo-
lution. Linear interpolation methods, such as nearest neighbor interpolation and bilinear
interpolation, only take into account the sub-pixel neighborhood of the current pixel, which
results in insufficient semantic information and limited receptive fields. On the other hand,
the deconvolution method expands the dimensionality of the feature map through convolu-
tion, but it applies the same convolution kernel across the entire feature map, which makes
it challenging to capture local changes and variations in the feature map. Furthermore, this
method introduces a large number of parameters and computational overhead into the
network. The CARAFE proposed by Wang et al. [19] compensates the shortcomings of the
above two types of methods to some extent: CARAFE perceives and aggregates contextual
information within a larger reception field, and instead of applying a fixed convolution
kernel to all features, it dynamically generates adaptive up-sampling kernels, and then
reorganizes the features based on the predicted up-sampling kernels. The up-sampling
kernel prediction module of CARAFE changes the dimensionality of the input feature map
by convolution layer to generate a feature map with the channel number of σ2 k2up , where σ
indicates the up-sampling rate (generally 2) and k up indicates the size of the up-sampling
kernel (the value in this paper is 5), and then uses the pixel shuffle method to expand the
channel dimension in the spatial dimension to obtain an up-sampling kernel map with the
shape of σH × σW × k2up , which contains σH × σW up-sampling kernels. Each position
Appl. Sci. 2023, 13, 4402 6 of 15

of the output feature map is then mapped back to the input feature map in the feature
reassembly module. The region of k up × k up , the center of the map, is taken out and dotted
with the predicted up-sampling kernels at that point to obtain the result. Each channel at
the same spatial location on the feature map uses the same up-sampling kernel.
The analysis reveals that CARAFE uses a zero-padding strategy on the feature map
edge positions in the feature reorganization stage, which leads to imperfect edge informa-
tion of the up-sampled images and it is difficult to correctly upsample the target features
at the edge of the feature map. Based on this, this paper proposes an improved Content-
Aware Up-Sampling-Lite (CAU-Lite) method to replace the nearest neighbor up-sampling
method in YOLOv5. Before finding the neighborhood of k up × k up , the nearest neighbor
interpolation is used to upsample the input feature map, so that the spatial dimension of
the feature map is the same as the spatial dimension of the up-sampled kernel map. Then,
at each spatial position in the feature map, the element with the size 1 × 1 × k2up × C is
taken out and reshaped to the size of k up × k up × C. At the same time, the upper sampling
core of 1 × 1 × k2up is reshaped at the corresponding position in the upper sampling core
map to the size of k up × k up × 1. The product of each channel of the feature map with the
upper sampling core is dotted to obtain a result with the size 1 × 1 × C, and the result
Appl. Sci. 2023, 13, x FOR PEER REVIEW 7 of 17
obtained for all channels is the result of the corresponding position in the output feature
map. The improved CAU-Lite structure and calculation process are shown in Figure 3.

Figure
Figure 3.
3. Structure
Structure of
of CAU-Lite.
CAU-Lite.

3.3. YOLOv5
3.3. YOLOv5 Network
Network Model
Model Improvement
Improvement
The YOLOv5
The YOLOv5 algorithm
algorithmisismainly
mainlydesigned
designedfor forthe
thevisible
visiblelight
light domain
domain and
and is is bet-
better
ter suited for deployment on GPUs due to its high number of model
suited for deployment on GPUs due to its high number of model parameters, large com- parameters, large
computational
putational requirements,
requirements, and large
and large model
model size. size. However,
However, the objective
the objective ofpaper
of this this paper
is to
is to create lightweight target detection networks for edge-embedded
create lightweight target detection networks for edge-embedded devices, making it rea- devices, making it
reasonable to employ a lightweight network structure instead of the
sonable to employ a lightweight network structure instead of the heavy CSPDarknet back-heavy CSPDarknet
backbone
bone network
network of YOLOv5.
of YOLOv5. OneOne promising
promising candidate
candidate forforsuch
suchaa lightweight
lightweight network
network
model is ShuffleNetv2, proposed by Ma et al. of Megvii’s team [20], which achieves a gooda
model is ShuffleNetv2, proposed by Ma et al. of Megvii’s team [20], which achieves
good balance
balance between between
modelmodel accuracy
accuracy and running
and running speed.speed. By using
By using lightweight
lightweight structures
structures such
as grouped convolution and depthwise convolution, ShuffleNetv2 is optimized for com-
putational complexity, storage access cost, and parallelism, resulting in a noticeable im-
provement in actual running speed. In light of this, we chose to use an improved version
of ShuffleNetv2 as the backbone network structure of the Edge-YOLO algorithm. The fol-
Appl. Sci. 2023, 13, 4402 7 of 15

such as grouped convolution and depthwise convolution, ShuffleNetv2 is optimized for

computational complexity, storage access cost, and parallelism, resulting in a noticeable
improvement in actual running speed. In light of this, we chose to use an improved version
of ShuffleNetv2 as the backbone network structure of the Edge-YOLO algorithm. The
following problems usually exist in IR road scenes:
(1) Compared with visible light images, original infrared images lack detailed texture and
feature complexity, making them less susceptible to network perception, especially
in deeper and complex networks where features are lost and additional noise is
introduced.
(2) In road scenes, pedestrians and bicycles are typically present in slender strips rather
than regular shapes. However, traditional convolution layers use regular convolution
kernels (N × N), which may result in a loss of feature information due to their inability
to adapt to target shape changes.
To enhance the extraction capability of strip-shaped features in IR scenes and the
perception of salient features in IR images without increasing computational effort, this
paper proposes embedding the strip depthwise convolutional attention module (SDCA)
Appl. Sci. 2023, 13, x FOR PEER REVIEW 8 of 17
into ShuffleBlock. As shown in Figure 4, SDCA takes an input and generates a shortcut
branch. The local information is aggregated by a 5 × 5 depthwise convolution, and
the output is convolved by three branches: a pair of 1 × 7 and 7 × 1 depthwise strip
convolutions,
convolutions,aa pair
pair of
of 11 ××11
11 and
and 11
11 ××11depthwise
depthwisestrip
stripconvolutions,
convolutions,and andaa shortcut
shortcut
branch. These branches capture multi-scale contextual information and
branch. These branches capture multi-scale contextual information and strip strip features, and
features,
their results are summed and passed through a 1 × 1 normal convolutional
and their results are summed and passed through a 1 × 1 normal convolutional layer tolayer to model
the relationship
model betweenbetween
the relationship differentdifferent
channels. The output
channels. The of this layer
output of thisis layer
then is
used
thenasused
the
attention weights
as the attention to weigh
weights the input
to weigh of SDCA
the input of SDCAby by
multiplying
multiplyingit itwith
withthethegenerated
generated
shortcut branch.
shortcut branch.

Figure 4. Strip depthwise convolutional attention module.

Figure 4. Strip depthwise convolutional attention module.
The improved structure obtained by embedding the strip depthwise convolutional
The improved structure obtained by embedding the strip depthwise convolutional
attention module into ShuffleBlock, as well as the addition of SENet [21] as the channel
attention module into ShuffleBlock, as well as the addition of SENet [21] as the channel
attention mechanism in the right branch, are shown in Figure 5. Typically, 1 × 1 convo-
attention mechanism in the right branch, are shown in Figure 5. Typically, 1 × 1 convolu-
lutions are used before and after depthwise convolutions to fuse information between
tions are used
channels and before and after
to increase depthwise
or decrease convolutions
the number to fuse information
of channels. However, thebetween
originalchan-
Shuf-
nels
fleBlock uses 1 × 1 convolution layers before and after the depthwise convolutionShuffle-
and to increase or decrease the number of channels. However, the original layer in
Block uses 1 × 1 convolution layers before and after the depthwise convolution layer in its
right branch, which generates redundancy. To reduce the number of parameters and com-
putational demand, this paper removes the 1 × 1 convolution layer after the depthwise
convolution.
Appl. Sci. 2023, 13, 4402 8 of 15

its right
Appl. Sci. 2023, 13, x FOR PEER REVIEW branch, which generates redundancy. To reduce the number of parameters and
9 of 17
computational demand, this paper removes the 1 × 1 convolution layer after the depthwise
convolution.

ImprovedShuffleBlock.
5. Improved
Figure 5. ShuffleBlock.
(a):(a):
the the
basebase
unitunit
withwith
SDCASDCA applied;
applied; (b):
(b): the theunit
base baseforunit for
down-
sampling (2×). (2×).
down-sampling

4. Experiments
4. Experiments
4.1. Experimental Environment and Dataset
4.1. Experimental Environment and Dataset
The experiments in this paper were conducted using an Intel Xeon Platinum 8255C
The experiments in this paper were conducted using an Intel Xeon Platinum 8255C
CPU and an NVIDIA RTX 3090 GPU with CUDA version 11.7. To evaluate the detection
CPU and an NVIDIA RTX 3090 GPU with CUDA version 11.7. To evaluate the detection
performance of Edge-YOLO, we used the publicly available FLIR dataset, which is an
performance of Edge-YOLO, we used the publicly available FLIR dataset, which is an in-
infrared dataset released by FLIR in 2018. The dataset consists of more than 10,000 images
frared dataset released by FLIR in 2018. The dataset consists of more than 10,000 images
classified into four categories: Person, Bicycle, Car, and Dog. However, since there are only
classified
a few Doginto fourincategories:
images Person,
the dataset, Bicycle,
this paper onlyCar, and Dog.
evaluated the However, since there are
detection performance of
only a few Dog images in the dataset, this
Edge-YOLO for the remaining three categories. paper only evaluated the detection performance
of Edge-YOLO for the remaining three categories.
4.2. Bounding Box Hyperparameter Study
4.2. Bounding Box Hyperparameter
In the improved Study box loss function of this paper, there is a hyperpa-
EX-IoU bounding
In the
rameter improved
α that EX-IoU
affects the bounding
model’s accuracyboxperformance.
loss functionToofdetermine
this paper,the
there is a hyperpa-
optimal value of
rameter α that affects the model’s accuracy performance. To determine
α for the Edge-YOLO algorithm, we conducted multiple training and testing experiments the optimal value
of α for
using the Edge-YOLO
different values of α.algorithm, we conducted
The accuracy multiple
results obtained traininginand
are shown testing
Figure experi-
6. From the
ments
results,using
it can different
be observedvalues
thatof α.highest
the The accuracy results
mAP value obtained
of 78.8% are shown
is achieved when in the
Figure 6.
value
From
of α isthe
set results, it can
to 3, while thebemAP
observed
value that the
of the highest
model mAP value
decreases of 78.8%
to 76.9% whenisthe
achieved
value of when
α is
the value
set to of αindicates
8. This is set to that
3, while the mAPdetection
the model’s value ofaccuracy
the model decreases
improves byto 76.9%
2.47% when
when the
using
value of α is set to 8. This indicates that the model’s detection accuracy improves
the optimal value of α, and the model achieves its best performance in terms of detection by 2.47%
when using the
performance. Asoptimal
a result,value of α, selects
this paper and the3 model achieves
as the power of its
eachbest performance
term in EX-IoU to in obtain
terms
thedetection
of best accuracy performance.
performance. As a result, this paper selects 3 as the power of each term in EX-
IoU to obtain the best accuracy performance.
Appl. Sci. 2023, 13, x4402
FOR PEER REVIEW 10 9of
of 17
15

Figure 6. Variation
Variation in accuracy for different values of α on the algorithm.

4.3. Model
4.3. Model Lightweighting
Lightweighting Experiment
Experiment
By replacing
By replacing the
the backbone
backbone feature
feature extraction
extraction network
network of
of YOLOv5
YOLOv5 with
with the
the improved
improved
ShuffleBlock in this paper, i.e., Edge-YOLO shown in Figure 2, the overall number
ShuffleBlock in this paper, i.e., Edge-YOLO shown in Figure 2, the overall number of pa- of
parameters, computation, and model size of the algorithm model can be effectively
rameters, computation, and model size of the algorithm model can be effectively reduced, reduced,
and Table
and Table 11 below
below shows
shows the
the comparison
comparison of of each
each parameter
parameter after
after the
the model
model isis lightened
lightened
and improved.
and improved.
Table 1.
Table Comparison of
1. Comparison of lightweight
lightweight improvement
improvement effects.
effects.

Model
Model Params/M
Params/M Flops/G
Flops/G Size/MB
Size/MB
YOLOv5m
YOLOv5m 20.9
20.9 47.9
47.9 42.2
42.2
Edge-YOLO
Edge-YOLO 5.8
5.8 14.2
14.2 12.0
12.0

The
The table
table above
above shows
shows that
that by
by replacing
replacing the
the backbone
backbone network
network with
with the
the improved
improved
ShuffleBlock, Edge-YOLO reduces the number of network parameters
ShuffleBlock, Edge-YOLO reduces the number of network parameters by 72.2%, the by 72.2%, the
amount
amount of computation by 70.3%, and the model size by 71.6% compared with
of computation by 70.3%, and the model size by 71.6% compared with YOLOv5m. This YOLOv5m.
This demonstrates
demonstrates the significant
the significant lightweight
lightweight effect
effect of the
of the proposed
proposed method,
method, which
which helps
helps to
to reduce
reduce thethe storage
storage andand computation
computation resources
resources required
required by by
thethe model
model andand is more
is more suit-
suitable
able for deployment
for deployment on edge-embedded
on edge-embedded devices.
devices.

4.4. Ablation
Ablation Experiments
Experiments
In this part, the original ShuffleNetv2
ShuffleNetv2 is is firstly
firstly used as the backbone network
network of
YOLOv5m, based
YOLOv5m, based on which the ablation experiments of several improvement strategies
are conducted
proposed in this paper are conducted toto better
betterunderstand
understand the
theeffects
effectsof
ofdifferent
differentimprove-
improve-
ment strategies on the detection
detection effect in Edge-YOLO, and the results are shown inTable
effect in Edge-YOLO, and the results are shown in Table2
below.
2 below.

Table 2.
Table Ablation experiments.
2. Ablation experiments.

+EX-IoU +CAU-Lite +EX-IoU

+SDCA +CAU-Lite +SDCA
[email protected]/% [email protected]/%
FPS FPS
Params/M Params/M
Flops/GFlops/G Size/MB
Size/MB

√ 73.1 85.373.1 85.3

5.4 5.4 12.9 12.9 11.3
11.3
√ 74.3 85.174.3 85.1
5.4 5.4 12.9 12.9 11.3
11.3
√
√ 74.7 84.074.7 84.0
5.4 5.4 13.1 13.1 11.4
11.4
√
76.2 √ 81.576.2 81.5
5.8 5.8 14.1 14.1 12.0
12.0
√ √ √
√ √ 78.8 √ 80.778.8 5.8
80.7 5.8 14.2 14.2 12.0
12.0

As can be seen from Table 2, compared with the first group of experiments using only
As can be seen from Table 2, compared with the first group of experiments using only
the basic model, the second group of experiments with the addition of EX-IoU solves the
the basic model, the second group of experiments with the addition of EX-IoU solves the
problem of uncertainty in the aspect ratio of CIoU by improving the loss function of the
Appl. Sci. 2023, 13, 4402 10 of 15

problem of uncertainty in the aspect ratio of CIoU by improving the loss function of the
bounding box and accelerating the convergence of the loss function, and the detection
accuracy is improved by 1.2% from the results, while the remaining parameters remain
unchanged. The third group of experiments replaces the original nearest neighbor interpo-
lation up-sampling with the CAU-Lite up-sampling operator proposed in this paper, which
senses and aggregates contextual information within a larger reception field, dynamically
generates adaptive up-sampling kernels, and performs feature reorganization based on the
generated up-sampling kernels. It can be seen that with CAU-Lite, the detection accuracy
of the model is improved by 1.6%, but the FPS is also slightly reduced. The fourth group
of experiments applies the strip depthwise convolutional attention module proposed in
this paper, which replaces the original ShuffleNetv2 network structure with an improved
ShuffleBlock, enhancing the feature extraction capability for strip-shaped targets and the
perception of the saliency of infrared targets. As seen in the table, the detection performance
of the model is significantly improved by 3.1% compared with the original model, but the
number of parameters, computation, and size of the model increased due to the addition
of the new module. The final fifth set of experiments uses a combination of the three
improvement points proposed in this paper, and from the results, a larger performance im-
provement is obtained at the cost of fewer computational and storage resources compared
with the original model.
Figure 7 below shows the P–R curves of each class of different improvement strategies
applied to the base model and the complete Edge-YOLO. The figure shows that compared
with the base model, the APs of all three target categories are improved with different
improvement strategies, and the AP of the bicycle category is improved most significantly.

4.5. Comparison Experiments

To further verify the detection performance of the Edge-YOLO algorithm, this section
compares Edge-YOLO with Faster R-CNN, SSD, YOLOv5m, YOLOv7 [22], and other
mainstream target detection algorithms for comparison experiments, and the results are
shown in Table 3 below.

Table 3. Comparison of mainstream target detection algorithms.

[email protected]/%
Model [email protected]/% FPS Params/M Flops/G Size/MB
Person Bicycle Car
Faster
76.6 50.4 85.7 70.9 17.6 66.1 152.7 133.5
R-CNN
SSD 63.5 43.5 79.3 62.1 31.9 46.9 117.6 96.1
YOLOv5m 83.8 64.1 90.6 79.5 55.9 20.9 47.9 42.2
YOLOv7 86.4 67.3 91.4 81.7 27.9 37.2 105.1 74.8
YOLO-FIRI 79.5 52.4 88.6 73.5 71.4 7.2 20.4 15.0
Edge-
83.2 63.0 90.2 78.8 80.7 5.8 14.2 12.0
YOLO

From the table, we can first see that Faster R-CNN, as a two-stage algorithm, lags far
behind the single-stage algorithm in detection speed, and the current detection accuracy
does not have an advantage over the single-stage algorithm; while the SSD algorithm in
the single-stage algorithm speeds up the detection speed compared with Faster R-CNNN,
but the detection accuracy is also reduced accordingly. Both algorithms are not comparable
to the current YOLO series algorithms. Second, the detection accuracy of the algorithm
in this paper is basically the same compared with the YOLOv5m algorithm, but it has
obvious advantages in detection speed and consumption of computational and spatial
resources. Again, compared with the latest YOLOv7 algorithm, the detection accuracy
of the Edge-YOLO algorithm is slightly behind that of the YOLOv7 algorithm, but the
resources consumed by the YOLOv7 algorithm and its detection speed are completely
inferior to that of this paper. Finally, compared with the YOLO-FIRI target detection
Appl. Sci. 2023, 13, 4402 11 of 15

Appl. Sci. 2023, 13, x FOR PEER REVIEW 12 of 17

algorithm proposed by Li et al. also for IR scenes, this paper achieves higher detection
accuracy with less resource consumption and better results in real-time.

Figure7.7.P–R
Figure P–Rcurves
curves for
for each
each class.
class. (a):
(a): P-R
P-R curves
curves of
of the
the class
class ‘Person’;
‘Person’; (b):
(b): P-R
P-R curves
curves of
of the
the class
class
‘Bicycle’; (c): P-R curves of the class ‘Car’; (d): P-R curves of all classes.
‘Bicycle’; (c): P-R curves of the class ‘Car’; (d): P-R curves of all classes.

4.6.
4.5. Comparison
ComparisonofExperiments
Test Results
Figure 8 below
To further verifyshows the detection
the detection resultsofof
performance thesome images algorithm,
Edge-YOLO in the dataset under
this section
YOLO-FIRI, YOLOv5m, YOLOv7,
compares Edge-YOLO andR-CNN,
with Faster Edge-YOLO
SSD, of this paper.YOLOv7 [22], and other
YOLOv5m,
Since thetarget
mainstream Fasterdetection
R-CNN and SSD in the
algorithms for previous
comparison subsection are lagging
experiments, and theinresults
detection
are
accuracy
shown inand detection
Table 3 below.speed, only YOLO-FIRI, YOLOv5m, and YOLOv7 are used in the
visualization effect comparison with the algorithms in this paper.
TableFrom the figure,
3. Comparison ofitmainstream
can be seen thatdetection
target compared with YOLO-FIRI, the target detection
algorithms.
algorithm for infrared road scenes, the algorithm in this paper has a certain lead in accuracy
[email protected]/%
and aModel
higher confidence level in [email protected]/%
targets. In addition, observingFlops/G
FPS Params/M the fourth figure,
Size/MB
Person Bicycle Car
we can see that the YOLO-FIRI algorithm misdetects some pedestrian legs as bicycles,
Faster R-CNN 76.6 50.4 85.7 70.9 17.6 66.1 152.7 133.5
which has some defects. After comparing this algorithm with the YOLOv5m algorithm
SSD 63.5 43.5 79.3 62.1 31.9 46.9 117.6 96.1
YOLOv5m 83.8 64.1 90.6 79.5 55.9 20.9 47.9 42.2
YOLOv7 86.4 67.3 91.4 81.7 27.9 37.2 105.1 74.8
YOLO-FIRI 79.5 52.4 88.6 73.5 71.4 7.2 20.4 15.0
Appl. Sci. 2023, 13, 4402 12 of 15

and YOLOv7 algorithm, we can see that the three algorithms basically maintain the same
detection results, and they can detect cars, pedestrians, and a small number of bicycles in
road scenes well. Because the algorithm in this paper is a lightweight network model, it is
Appl. Sci. 2023, 13, x FOR PEER REVIEW 14 of 17
better than the other two algorithms in terms of the number of parameters, computation,
and model size, so this algorithm has a more practical application value.

Figure 8. Part of the test results.

4.7.Actual
4.7. ActualEdge
EdgeDevice
Device Deployment
Deployment Testing
Testing
Thispaper
This paperuses
usesthe
theRK3588
RK3588embedded
embedded development
development board of Rockchip
Rockchip asas the
the verifi-
verifica-
cation
tion platform,
platform, as as shown
shown ininFigure
Figure99below.
below.RK3588
RK3588 platform
platformisisequipped
equippedwith quad-core
with quad-core
A76+quad-core
A76+ quad-coreA55,
A55, an
an octa-core
octa-core CPU,
CPU, and
andNPU
NPUwith
with6TOPs
6TOPscomputation
computationpower.
power. ItsIts
high
high
computation power NPU supports INT4, INT8, INT16, and FP16 mixed computing,
computation power NPU supports INT4, INT8, INT16, and FP16 mixed computing, which which
canaccelerate
can accelerate the
the inference
inference of
ofnetwork
networkmodels.
models.The photo
The photoof of
RK3588 is shown
RK3588 below.
is shown below.
Appl. Sci. 2023, 13, 4402 13 of 15
Appl. Sci. 2023, 13, x FOR PEER REVIEW 15 of 17

Figure 9. The RK3588 embedded development board.

Thealgorithm
The algorithm model
model in in this
this paper
paperandandthe
thecomparison
comparisonalgorithm
algorithm model
modelareare
first ex-
first
ported to the compatible ONNX format, and then converted to the
exported to the compatible ONNX format, and then converted to the RKNN model sup- RKNN model sup-
portedby
ported bythe
theNPU
NPUofofthe theRK3588
RK3588platform
platformusing
usingthe
theRKNN-Toolkit2
RKNN-Toolkit2and andrknpu2
rknpu2toolstools
with inference acceleration such as asymmetric hybrid quantization, and these
with inference acceleration such as asymmetric hybrid quantization, and these models are models are
used to infer the test set images, and the performance comparison is obtained as shownin
used to infer the test set images, and the performance comparison is obtained as shown
inthe
thefollowing
followingtable.
table.InInaddition
additiontotoinference
inference using
using NPU, the performance
NPU, the performance of of only
onlyCPUCPU
inferenceisisalso
inference alsotested
testedininthis
thispaper
paperand
andisisshown
showntogether
togetherininTable
Table44below.
below.

Table4.4.Testing
Table Testingresults
resultson
onthe
theRK3588
RK3588platform.
platform.

Model [email protected]/% FPS(CPU) FPS(NPU)

Model [email protected]/% FPS(CPU) FPS(NPU)
YOLO-FIRI 73.0 0.8 21.6
YOLO-FIRI
YOLOv5m 73.0
79.1 0.8
0.5 21.6
14.5
YOLOv5m 79.1 0.5 14.5
YOLOv7 81.2 0.3 8.8
YOLOv7 81.2 0.3 8.8
Edge-YOLO 78.6 1.1 31.9
Edge-YOLO 78.6 1.1 31.9

As can be seen from the table, the accuracy of all four models on the RK3588 platform
has aAsslight
can be seen from
decrease duethe table, the
to model accuracy ofIn
quantization. alladdition,
four models on the
if only the ARM
RK3588 CPUplatform
is used
has a slight decrease due to model quantization. In addition, if only the ARM
for inference, the FPS of algorithms such as YOLO-FIRI is less than 1, i.e., the number CPU is usedof
for inference, the FPS of algorithms such as YOLO-FIRI is less than 1, i.e.,
images that can be inferred is less than one per second, and the algorithm in this paper the number of
images that can be inferred is less than one per second, and the algorithm in
only has an FPS of 1.1, which cannot be deployed in practical application scenarios. Afterthis paper only
has an FPS
using NPUofto1.1, which cannot
accelerate, we canbe deployed in inference
see that the practical application
speed of each scenarios.
algorithmAfterwas
using
im-
NPU to accelerate, we can see that the inference speed of each algorithm
proved by tens of times. However, the FPS of YOLOv5m and YOLOv7 are only 14.5 and was improved
by tens of times. However, the FPS of YOLOv5m and YOLOv7 are only 14.5 and 8.8,
8.8, respectively, which are more obvious to notice lags in real-world applications, while
respectively, which are more obvious to notice lags in real-world applications, while the
the algorithm in this paper can achieve 31.9 FPS, which can meet the performance require-
algorithm in this paper can achieve 31.9 FPS, which can meet the performance requirements
ments of practical scenarios.
of practical scenarios.

5.5.Conclusions
Conclusions
Theproposed
The proposedmethod
methodininthisthispaper,
paper,Edge-YOLO,
Edge-YOLO,isisaalightweight
lightweightIR IRtarget
targetdetection
detection
approach that aims to ensure good performance in road scenes
approach that aims to ensure good performance in road scenes and is suitable for and is suitable for edge-
edge-
embeddeddevices.
embedded devices.The
Thealgorithm
algorithmutilizes
utilizesan
anoptimized
optimizedbounding
boundingbox boxloss
lossfunction,
function,thethe
improved EX-IoU, to enhance the regression accuracy of the bounding box. Moreover, toto
improved EX-IoU, to enhance the regression accuracy of the bounding box. Moreover,
improvethe
improve theup-sampling
up-sampling effect,
effect, thethe algorithm
algorithm adopts
adopts the improved
the improved CAU-Lite
CAU-Lite up-sam-
up-sampling
pling operator, which perceives the contextual content. Lastly, the
operator, which perceives the contextual content. Lastly, the lightweight ShuffleBlocklightweight Shuffle-
Block replaces the backbone feature extraction part of the network,
replaces the backbone feature extraction part of the network, and the strip depthwise and the strip depth-
wise convolutional
convolutional attention
attention modulemodule
is usedistoused to enhance
enhance the extraction
the extraction capability
capability of strip-
of strip-shaped
shapedand
targets targets
otherand other
salient salientpresent
features featuresinpresent in the IR
the IR feature mapfeature
for themap for the Shuffle-
ShuffleBlock, thus
Block, thus
further furtherthe
enhancing enhancing
detectionthe detection
accuracy accuracy
of the model. ofThe
the model. The experimental
experimental results on the re-
FLIR dataset demonstrate that Edge-YOLO is essentially equivalent to YOLOv5m in termsto
sults on the FLIR dataset demonstrate that Edge-YOLO is essentially equivalent
YOLOv5m in terms of accuracy, while reducing the number of network parameters,
Appl. Sci. 2023, 13, 4402 14 of 15

of accuracy, while reducing the number of network parameters, computation, and model
size by 72.2%, 70.3%, and 71.6%, respectively. Additionally, the detection speed is increased
by 44.4%, making the algorithm more suitable for embedded device applications.

Author Contributions: Conceptualization, J.Y.; methodology, J.L.; software, J.L.; validation, J.L.;
writing—original draft preparation, J.L.; writing—review and editing, J.L. and J.Y.; supervision, J.Y.
All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Chen, C.P.; Li, H.; Wei, Y.; Xia, T.; Tang, Y.Y. A local contrast method for small infrared target detection. IEEE Trans. Geosci. Remote
Sens. 2013, 52, 574–581. [CrossRef]
2. Liu, R.; Lu, Y.; Gong, C.; Liu, Y. Infrared point target detection with improved template matching. Infrared Phys. Technol. 2012, 55,
380–387. [CrossRef]
3. Teutsch, M.; Muller, T.; Huber, M.; Beyerer, J. Low resolution person detection with a moving thermal infrared camera by hot spot
classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH,
USA, 23–28 June 2014; pp. 209–216.
4. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings
of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28.
5. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings
of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham,
Switzerland, 2016; pp. 21–37.
6. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Amsterdam, The Netherlands, 11–14 October 2016; pp. 779–788.
7. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
8. Wang, C.Y.; Liao HY, M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance the learning
capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle,
WA, USA, 14–19 June 2020; pp. 390–391.
9. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans.
Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [CrossRef]
10. Li, S.; Li, Y.; Li, Y.; Li, M.; Xu, X. YOLO-FIRI: Improved YOLOv5 for Infrared Image Object Detection. IEEE Access 2021, 9,
141861–141875. [CrossRef]
11. Fan, Y.; Qiu, Q.; Hou, S.; Li, Y.; Xie, J.; Qin, M.; Chu, F. Application of Improved YOLOv5 in Aerial Photographing Infrared
Vehicle Detection. Electronics 2022, 11, 2344. [CrossRef]
12. Dai, X.; Yuan, X.; Wei, X. TIRNet: Object detection in thermal infrared images for autonomous driving. Appl. Intell. 2021, 51,
1244–1261. [CrossRef]
13. Li, B.; Xiao, C.; Wang, L.; Wang, Y.; Lin, Z.; Li, M.; An, W.; Guo, Y. Dense nested attention network for infrared small target
detection. IEEE Trans. Image Process. 2022, 32, 1745–1758. [CrossRef] [PubMed]
14. You, S.; Ji, Y.; Liu, S.; Mei, C.; Yao, X.; Feng, Y. A thermal infrared pedestrian-detection method for edge computing devices.
Sensors 2022, 22, 6710. [CrossRef] [PubMed]
15. Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. Proc.
AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [CrossRef]
16. Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression.
Neurocomputing 2022, 506, 146–157. [CrossRef]
17. He, J.; Erfani, S.; Ma, X.; Bailey, J.; Chi, Y.; Hua, X.S. Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding
Box Regression. Adv. Neural Inf. Process. Syst. 2021, 34, 20230–20242.
18. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125.
19. Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016.
20. Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of
the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131.
Appl. Sci. 2023, 13, 4402 15 of 15

21. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141.
22. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object
detectors. arXiv 2022, arXiv:2207.02696.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

World Beyond The Windshield Roads and Landscapes in The United
No ratings yet
World Beyond The Windshield Roads and Landscapes in The United
299 pages
Enhanced YOLOv8 Infrared Image Object Detection Method With SPD Module
No ratings yet
Enhanced YOLOv8 Infrared Image Object Detection Method With SPD Module
7 pages
YOLOv5s FMG An Improved Small Target Detection Algorithm Based On YOLOv5 in Low Visibility
No ratings yet
YOLOv5s FMG An Improved Small Target Detection Algorithm Based On YOLOv5 in Low Visibility
12 pages
Cam and Yolo
No ratings yet
Cam and Yolo
13 pages
YOLO-FIRI Improved YOLOv5 For Infrared Image Objec
No ratings yet
YOLO-FIRI Improved YOLOv5 For Infrared Image Objec
16 pages
Overview of YOLO ObjectDetectionAlgorithm
No ratings yet
Overview of YOLO ObjectDetectionAlgorithm
7 pages
ISTD-YOLO: A Multi-Scale Lightweight High-Performance Infrared Small Target Detection Algorithm
No ratings yet
ISTD-YOLO: A Multi-Scale Lightweight High-Performance Infrared Small Target Detection Algorithm
18 pages
Applsci 13 09316
No ratings yet
Applsci 13 09316
18 pages
Paper 5
No ratings yet
Paper 5
13 pages
Lightweight and Efficient Tiny-Object Detection Based On Improved YOLOv8n For UAV Aerial Images
No ratings yet
Lightweight and Efficient Tiny-Object Detection Based On Improved YOLOv8n For UAV Aerial Images
20 pages
YOLO-FIRI Improved YOLOv5 For Infrared Image Object Detection
No ratings yet
YOLO-FIRI Improved YOLOv5 For Infrared Image Object Detection
15 pages
Yolo1 11
No ratings yet
Yolo1 11
38 pages
YOLOv1 v8综述
No ratings yet
YOLOv1 v8综述
36 pages
Applsci 13 12977
No ratings yet
Applsci 13 12977
21 pages
YOLOV8
No ratings yet
YOLOV8
13 pages
YOLOv8n-FAWL Object Detection For Autonomous Driving Using YOLOv8 Network On Edge Devices
No ratings yet
YOLOv8n-FAWL Object Detection For Autonomous Driving Using YOLOv8 Network On Edge Devices
12 pages
Du 2019 J. Phys. Conf. Ser. 1314 012202
No ratings yet
Du 2019 J. Phys. Conf. Ser. 1314 012202
7 pages
Research On An Improved Fish Recognition Algorithm Based On YOLOX
No ratings yet
Research On An Improved Fish Recognition Algorithm Based On YOLOX
10 pages
CSPPartial-YOLO A Lightweight YOLO-Based Method For Typical Objects Detection in Remote Sensing Images
No ratings yet
CSPPartial-YOLO A Lightweight YOLO-Based Method For Typical Objects Detection in Remote Sensing Images
12 pages
Improved Small-Object Detection Using YOLOv8 A Com
No ratings yet
Improved Small-Object Detection Using YOLOv8 A Com
9 pages
Lightweight Aerial Image
No ratings yet
Lightweight Aerial Image
10 pages
YOLO Series Algorithms in Object Detection of Unmanned Aerial Vehicles: A Survey
No ratings yet
YOLO Series Algorithms in Object Detection of Unmanned Aerial Vehicles: A Survey
30 pages
YED-YOLO: An Object Detection Algorithm For Automatic Driving
No ratings yet
YED-YOLO: An Object Detection Algorithm For Automatic Driving
9 pages
Article 1
No ratings yet
Article 1
9 pages
Applsci 13 04144 v2
No ratings yet
Applsci 13 04144 v2
26 pages
Make 05 00083 v2
No ratings yet
Make 05 00083 v2
37 pages
Enhancing Real-Time Object Detection With YOLO Alg
No ratings yet
Enhancing Real-Time Object Detection With YOLO Alg
9 pages
Applsci 14 05841 With Cover
No ratings yet
Applsci 14 05841 With Cover
26 pages
YOLO Algorithm For Real-Time Object Detection: 2.1. Network Design
No ratings yet
YOLO Algorithm For Real-Time Object Detection: 2.1. Network Design
3 pages
YOLO Model-Based Target Detection Algorithm For UA
No ratings yet
YOLO Model-Based Target Detection Algorithm For UA
5 pages
YOLO Based Object Detection Models: A Review and Its Applications
No ratings yet
YOLO Based Object Detection Models: A Review and Its Applications
40 pages
Sensors 23 06423
No ratings yet
Sensors 23 06423
23 pages
Object Detection Using Yolo Algorithm-1
No ratings yet
Object Detection Using Yolo Algorithm-1
9 pages
Real-Time Target Detection System For Animals Based On Self-Attention Improvement and Feature Extraction Optimization
No ratings yet
Real-Time Target Detection System For Animals Based On Self-Attention Improvement and Feature Extraction Optimization
21 pages
Sensors 23 05824
No ratings yet
Sensors 23 05824
23 pages
Evolution of Yolo Algorithm and Yolov5: The State-Of-The-Art Object Detection Algorithm
100% (1)
Evolution of Yolo Algorithm and Yolov5: The State-Of-The-Art Object Detection Algorithm
61 pages
Efficient Object Detection With YOLO A C
No ratings yet
Efficient Object Detection With YOLO A C
13 pages
Yolov5 Paper
No ratings yet
Yolov5 Paper
12 pages
YOLO Object Detection Explained: Definitive Reference for Developers and Engineers
From Everand
YOLO Object Detection Explained: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
You Only Look Once - Unified, Real-Time Object Detection
No ratings yet
You Only Look Once - Unified, Real-Time Object Detection
10 pages
Remotesensing 12 02501 v2
No ratings yet
Remotesensing 12 02501 v2
26 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
YOLOv8 A Novel Object Detection Algorithm With Enhanced Performance and Robustness
No ratings yet
YOLOv8 A Novel Object Detection Algorithm With Enhanced Performance and Robustness
6 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
YOLOCS：基于密集通道压缩的特征空间固化目标检测
No ratings yet
YOLOCS：基于密集通道压缩的特征空间固化目标检测
9 pages
2023 Object Detection in Thermal Infrared Image Based On Improved YOLOX
No ratings yet
2023 Object Detection in Thermal Infrared Image Based On Improved YOLOX
5 pages
SAR Ship Detection Based On YOLOv5 Using CBAM and BiFPN
No ratings yet
SAR Ship Detection Based On YOLOv5 Using CBAM and BiFPN
4 pages
Yolo
No ratings yet
Yolo
10 pages
Drones 07 00095
No ratings yet
Drones 07 00095
17 pages
Improved YOLOv7-Tiny For Object Detection Based On
No ratings yet
Improved YOLOv7-Tiny For Object Detection Based On
23 pages
Yolo Paper
No ratings yet
Yolo Paper
10 pages
Yolopdf
No ratings yet
Yolopdf
10 pages
Tinier YOLO
No ratings yet
Tinier YOLO
10 pages
Research Paper
No ratings yet
Research Paper
14 pages
Final Synopsis1
No ratings yet
Final Synopsis1
10 pages
Object Detection Document
No ratings yet
Object Detection Document
4 pages
SSRN 4542996
No ratings yet
SSRN 4542996
17 pages
You Only Look Once Model-Based Object Identification in Computer Vision
No ratings yet
You Only Look Once Model-Based Object Identification in Computer Vision
12 pages
PG-YOLO A Novel Lightweight Object Detection Metho
No ratings yet
PG-YOLO A Novel Lightweight Object Detection Metho
11 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Lumbang Integrated National High School
No ratings yet
Lumbang Integrated National High School
3 pages
IMS Questions 2024 - Bangalore (English) Above 15 Years
No ratings yet
IMS Questions 2024 - Bangalore (English) Above 15 Years
2 pages
Mobile-IP Seminar Report
No ratings yet
Mobile-IP Seminar Report
33 pages
In Any Lighting Conditions: Easy-to-Read Display
No ratings yet
In Any Lighting Conditions: Easy-to-Read Display
2 pages
Work Order For School Uniform
No ratings yet
Work Order For School Uniform
1 page
Control Statements PHP
No ratings yet
Control Statements PHP
131 pages
Displaypdf PDF
No ratings yet
Displaypdf PDF
2 pages
Courtship Dating and Marriage Apple GR 1
No ratings yet
Courtship Dating and Marriage Apple GR 1
29 pages
Treasurers Certificate
No ratings yet
Treasurers Certificate
2 pages
Affirmative Action in Malaysia: Education and Employment Outcomes Since The 1990s
No ratings yet
Affirmative Action in Malaysia: Education and Employment Outcomes Since The 1990s
37 pages
Ap Spanish Language - Syllabus Spanish 4
No ratings yet
Ap Spanish Language - Syllabus Spanish 4
9 pages
MEAN Stack Web Development Lab Manual (Week 1-13) - Student Version
100% (2)
MEAN Stack Web Development Lab Manual (Week 1-13) - Student Version
39 pages
Bencao Gangmu 12
No ratings yet
Bencao Gangmu 12
4 pages
11 Rules of English Grammar
No ratings yet
11 Rules of English Grammar
4 pages
Pool Activity Level (PAL) Instrument For Occupational Profiling
No ratings yet
Pool Activity Level (PAL) Instrument For Occupational Profiling
35 pages
Maths Revision 3
No ratings yet
Maths Revision 3
16 pages
Long Quiz Week 12 Joint Arrangements - ACTG341 Advanced Financial Accounting and Reporting 1
No ratings yet
Long Quiz Week 12 Joint Arrangements - ACTG341 Advanced Financial Accounting and Reporting 1
4 pages
Coping With Stress in Middle and Late Adolescence
100% (1)
Coping With Stress in Middle and Late Adolescence
11 pages
FSM 1989 Tracker 00 General Information
No ratings yet
FSM 1989 Tracker 00 General Information
24 pages
Snow White and The Seven Dwarfs (1937)
No ratings yet
Snow White and The Seven Dwarfs (1937)
1 page
Burns Patent PDF
No ratings yet
Burns Patent PDF
11 pages
Please Provide Answers To The Following Questions:: Activity 5 - Determine Appropriate Business Structure
No ratings yet
Please Provide Answers To The Following Questions:: Activity 5 - Determine Appropriate Business Structure
4 pages
Corporation) Law) Case) Digests) 3C) &) 3S) - ) ATTY.) CARLO) BUSMENTE)
No ratings yet
Corporation) Law) Case) Digests) 3C) &) 3S) - ) ATTY.) CARLO) BUSMENTE)
7 pages
Graiffe Adopt Me - Google Search
No ratings yet
Graiffe Adopt Me - Google Search
1 page
Spellbound Kingdoms Revised
100% (5)
Spellbound Kingdoms Revised
300 pages
Communicative Strategies
No ratings yet
Communicative Strategies
4 pages
ETR PHD Chemistry 2019
No ratings yet
ETR PHD Chemistry 2019
5 pages
Module 3 Becg
No ratings yet
Module 3 Becg
23 pages
Design Thinking Question Paper
100% (4)
Design Thinking Question Paper
8 pages

Edge-YOLO Lightweight Infrared Object Detection Method Deployed On Edge Devices

Uploaded by

Edge-YOLO Lightweight Infrared Object Detection Method Deployed On Edge Devices

Uploaded by

applied

Appl. Sci. 2023, 13, 4402. https://doi.org/10.3390/app13074402 https://www.mdpi.com/journal/applsci

Appl. Sci. 2023, 13, 4402 3 of 15

Figure 2. The Figure

3.1. Improved3.1. Improved

Lα− IoU = 1 − IoU α (3)

ρ2α b, b gt ρ2α w, w gt ρ2α h, h gt

3.2. Content-Aware Lightweight Up-Sampling Operator CAU-Lite

such as grouped convolution and depthwise convolution, ShuffleNetv2 is optimized for

Figure 4. Strip depthwise convolutional attention module.

+EX-IoU +CAU-Lite +EX-IoU

√ 73.1 85.373.1 85.3

4.5. Comparison Experiments

Table 3. Comparison of mainstream target detection algorithms.

Appl. Sci. 2023, 13, x FOR PEER REVIEW 12 of 17

Figure 8. Part of the test results.

Figure 9. The RK3588 embedded development board.

Model [email protected]/% FPS(CPU) FPS(NPU)

You might also like