UAV Target Detection Algorithm Based On Improved YOLOv8
UAV Target Detection Algorithm Based On Improved YOLOv8
ABSTRACT Since UAVs usually fly at higher altitudes, resulting in a more significant proportion of small
targets after imaging, this poses a challenge to the target detection algorithm at this stage; in addition, the
high-speed flight of UAVs causes a sense of blurring on the detected objects, which leads to difficulties in
target feature extraction. To address the two problems presented above, we propose a UAV target detection
algorithm based on improved YOLOv8. First, the small target detection structure (STC) is embedded in the
network, which acts as a bridge between shallow and deep features to improve the collection of semantic
information of small targets and enhance detection accuracy. Second, using the feature of global information
of UAV imaging-focused targets, the global attention GAM is introduced to the bottom layer of YOLOv8m’s
backbone to prevent the loss of image feature information during sampling and thus increase the algorithm’s
detection performance by feeding back feature information of different dimension. The modified model
effectively increases the detection of tiny targets with an mAP value of 39.3%, which is 4.4% higher than
the baseline approach, according to experimental results on the VisDrone2021 dataset, and outperforms
mainstream algorithms such as SSD and YOLO series, effectively increasing the detection performance of
UAVs for small targets.
INDEX TERMS UAV target detection, global attention mechanism, small target detection.
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
116534 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
F. Wang et al.: UAV Target Detection Algorithm Based on Improved YOLOv8
several application areas. Especially for embedded systems, but the addition of TPH adds a huge amount of parameters,
the required computation time is too long. Similarly, many which affects the network’s computing speed. Xia et al.
methods sacrifice detection accuracy for detection speed. [14] developed a way to convert small UAV detection into
To solve the problem of the coexistence of accuracy and predicted residual images by integrating residual blocks into
speed, the methods of YOLO [6] and SSD [7] have emerged. U-Net and combining different scale feature fusion for image
Such methods use the idea of regression-based methods to reconstruction to enhance the detection performance of UAV
directly regress the coordinates of the area frame and object collected data. Fang et al. [15] updated a way to convert
class at this location among multiple locations of the input UAV detection into predicted residual images by learning a
image. In target detection tasks, the YOLO series algorithms nonlinear mapping from the input image to the residual image
have been popular in various research, and with the latest and introduced multiscale feature fusion for comprehensive
YOLOv8 [8] series, they have shown excellent performance aggregation to enhance the detection capability.
in the field of target detection. This work provides an
enhanced YOLOv8 algorithm based on the industrial UAV
target detection job and obtains good results on the UAV
dataset VisDrone2021.This paper’s primary contributions are
as follows:
1) Adding the Small target connection (STC) layer to
stitch the shallower feature map with the deeper feature
map to improve feature fusion of different parts, reduce
model semantic information loss on the way to sampling, and
improve algorithm detection performance on UAV captured
images.
2) The global attention mechanism (GAM) is introduced
in the backbone network module Backbone to enable
the algorithm to capture feature information in multiple
dimensions, which fully exploits the visual representation
of the perceptual field in each dimension and achieves
performance improvement in the real acquisition process. FIGURE 1. As shown above, three convolutional structures and n
BottleNeck modules are used in the C3 module structure.
to integrate the model. To address the shortcomings of the reduced the model’s hyperparameters to reduce network
above literature, this paper introduces the latest YOLOv8 complexity, and brought the computational speed closer to
and uses it as a baseline system to propose a small target the high-speed labels of YOLO series techniques.
detection algorithm based on an improved YOLOv8 UAV For the UAV aerial photography scene with a large
perspective for the characteristics of small target detection background base and small targets, the detection targets have
tasks. To limit the rise in the number of parameters caused the problem of mutual occlusion, This work provides an
by global attention, only one layer is added in the back- enhanced YOLOv8 method based on small targets to reduce
bone module for connecting the spatial pyramidal pooling false and missing detections throughout the detection phase.
structure. The improved YOLOv8 framework consists of three main
modules, including Backbone, Neck, and Head, which used
adaptive anchor frames to compute the best anchor frame
values in different training sets after preprocessing the image
data by Mosaic [28] data enhancement. The Backbone
module included a CBS structure and a C2f structure for
extracting feature information from the input image; the Neck
module combined the feature pyramid FPN [29] with the path
aggregation network PAN [30], which passed down the strong
semantic features from the higher levels for strengthening
the semantic information. the PAN complements the FPN
from the bottom up by passing down the feature information
from the bottom, and the combined PAN-PFN module
enhances the network feature integration and outputs the
obtained image features to the Head module; the Head
FIGURE 2. As shown in the figure above is the C2f module, which is
designed with reference to the C3 module as well as the idea of ELAN, module, through the two heads, outputted the output of cls and
so that YOLOv8 can obtain richer gradient flow information while reg respectively and finally predicted the category using the
ensuring lightweight.
bounding box generated by making predictions on the image
features. The structure of the modified network is shown
in Fig.4.
III. METHOD
YOLOv8 is the most recent version of Ultralytics’ YOLO A. YOLOv8 BASED ON GAM ATTENTION MECHANISM
object recognition and picture segmentation model. To The feature extraction module acts in the model to extract
achieve the model’s lightweight, YOLOv8 replaced the C3 the local features of the input image, which increased the
module with a C2f module based on YOLOv5. C3 module, difficulty of detection because the targets in the UAV example
which is mainly designed with the idea of CSPNet extraction images are usually small. To improve the model’s detection
shunt, while combining the idea of residual structure, the accuracy, the Global Attention Mechanism (GAM) [31] was
so-called C3 Block, where the CSP main branch gradient introduced in the backbone network module Backbone, This
module is the BottleNeck module, while the number of captures crucial information in all three dimensions and
stacking is controlled by the parameter n. As shown in significantly reduced the detection difficulty of small targets
Fig.1, three convolutional structures (Conv+BN+SiLU) and in UAV example situations.
n BottleNeck modules are used in the C3 block. And the The GAM attention can amplify the global dimensional
C2f module learned the advantages of the C3 module and target interaction features with reduced loss of critical
the ELAN [25] module in YOLOV7 [26] by more branching information. The GAM model used novel channel-space
cross-layer links, allowing it to obtain richer gradient flow attention to replace the sub-module of the original CBAM
information while maintaining light weight, and the structure model. The process of extracting features is shown in Fig.5,
of the C2f module is shown in Fig.2. and the following equations define the intermediate stages
YOLOv8 introduced the Decoupled-Head [27] structure and outputs for a particular input feature mapping:
to extract the target location and category information
separately, learned them separately through different network F2 = Mc (F1 ) ⊗ F1 (1)
branches, and then fused them. This module successfully F3 = Ms (F2 ) ⊗ F2 (2)
reduced the number of parameters and computational com-
plexity while improving the model’s generalization and where Mc and Ms denote the channel attention module
robustness. The Decoupled-Head structure is shown in Fig.3. and the spatial attention module respectively; and ⊗ denotes
In the Head module, the Anchor Free mechanism of YOLOX a multiplicative operation; F1 , F2 , and F3 denote
was also introduced to directly predict the edges of small input features, intermediate features, and output features
targets, filtered out the noise interference terms of the labels, respectively.
FIGURE 3. The structure of the decoupled head is shown above. The computation of the model structure leads to the
classification and regression branches, where the regression branch is represented using the integral form of the
distribution focus loss.
FIGURE 4. The improved structure is mainly composed of three main modules, including Backbone, Neck, and Head, and we add GAM attention to its
backbone layer to amplify the global dimensional target interaction features while reducing the loss of key information. In the Neck structure,
we improve the downsampling of the original framework and make the model more focused on small target detection by adding a small target
detection layer.
FIGURE 5. Shown above is the structural diagram of GAM attention, which borrows the sequential channel spatial
attention mechanism from CBAM attention and redesigns the sub-modules to output the final result for a given
feature map using channel module and spatial module processing respectively.
elements in different dimensions. The module structure is detection algorithms are better for large and medium targets
shown in Fig.6. than small targets. Therefore, too many small targets in the
dataset will prevent the detection algorithm from giving full
2) SPATIAL ATTENTION SUBMODULE
play to its performance.
In the process of target detection, the shallow network has
In the spatial attention sub-module, to facilitate focusing
a small sensory field and weak semantic information, but
more spatial information, the model uses two convolutional
strong detail representation ability. As the network model
layers to perform information fusion in space. The structure
becomes larger and deeper, the detail representation ability of
chooses to eliminate the pooling operation to further protect
the sensory field of the network structure will be weakened,
the local feature information because the maximum pooling
so the model is too deep or too shallow will affect the
operation will lose some of the spatial information. Although
accuracy of target detection. Because the detection effect
this may increase the number of parameters, it is more
of the YOLO series algorithm is reduced due to the small
complete in collecting spatial information and is not easy to
size of small target samples, and because the upsampling
ignore part of the feature mapping. The module structure is
multiple of YOLOv8 is relatively large, it is difficult for the
shown in Fig.7.
deeper feature map to learn the feature information of small
In the UAV example scene target detection task, the
targets, this paper proposes to improve the detection of the
features such as long imaging distance, high speed, and
small target language by adding a small target detection layer
low altitude flight producing motion blur resulting in small
STC after stitching the shallow feature map. In this research,
target objects and difficulty in capturing features lead to
we propose that a small target detection layer STC be added
a large amount of training for invalid areas, which in
to connect the shallow feature map with the deep feature map
turn affects the training efficiency of the network. The
to increase the gathering of semantic information of small
network structure is shown in Fig.8. After adding global
targets. By adding the STC module to connect the deep and
attention GAM to the feature extraction layer of the
shallow networks, the network will pay greater attention to
YOLOv8 target detection model, the feature information
small target detection, which can significantly improve the
of different dimensions is collected and fed back to
algorithm’s detection performance for UAV-captured photos.
reduce the loss of imaging feature information during the
The STC module is shown in Fig.10. The STC module is a
sampling process, and the visual representation of the
structure derived from the original YOLOv8 module, which
perceptual field of each dimension is fully utilized, which can
is used to solve the problem of feature information loss
achieve the performance improvement in the real acquisition
due to large upsampling multiples of the original YOLOv8.
process.
Three detection heads have defaulted on the original model,
which corresponds to the detection dimensions of feature map
B. STC MODULE IN NECK MULTISCALE FEATURE FUSION sizes of 80*80, 40*40*, and 20*20*, respectively, and the
The VisDrone2021 [32] dataset used for the experiments in Head part outputs feature maps of a total of 6 scales for
this paper has a large proportion of small targets. The width- classification and regression. The category prediction branch
height distribution of the targets in the dataset is shown in and box prediction branch of 3 different scales are spliced
Fig.9, and the parameters of the horizontal and vertical axes and dimensionally transformed. As shown in the original
are width and height, respectively. It can be seen that the YOLOv8 figure, the head part shows three detected output
distribution of points near the coordinate origin is dense and layers, we add a fourth upsampling layer and connect it
the color is the darkest, which indicates the largest number to the output layer to improve the accuracy of small target
of small targets in the data and fits the research problem of detection, which corresponds to a detection scale of 160*160.
this paper. In the aerial photography scene of UAVs, most structurally, the STC structure consists of a C2f module,
FIGURE 6. The channel attention submodule uses a three-dimensional arrangement to preserve three-dimensional
information. It then amplifies the channel-space dependencies across dimensions with a two-layer multilayer
perceptron (MLP).
FIGURE 7. In the spatial attention sub-module, two convolutional layers are used for spatial information fusion to
focus on spatial information. Pooling operations were removed from the sub-module to further preserve feature
mapping and to prevent a significant increase in parameters, group convolution with channel blending was used.
FIGURE 10. As shown in the figure, the STC structure consists of two C2f
modules, an upsampling layer, and a general convolutional layer, and the
input image size is changed from 80*80 to 160*160 after 2-fold
upsampling, and then the concat operation is performed in the channel
dimension, and finally, the output of the image is processed through the
decoupling header.
sets of experiments were done in the same experimental the detection of individual component modules and is not a
context to evaluate the impact of the enhanced module on the simple accumulation in accuracy.
baseline model, and the experimental results are detailed in In addition, to verify the location of adding the attention
Table 2. mechanism in the experimental phase and the effect of
Performance of the global attention mechanism. In Table 2, adding multiple layers of attention on the model detection
with the addition of the global attention GAM, the mAP performance. The attention was added after the first layer
improves by 0.6% and the accuracy improves by 2%, and the C2f module, after the first layer plus the second layer
accuracy of the algorithm is improved with the addition of a C2f module, after the first layer and the third layer C2f
small number of parameters to the model. The effectiveness module, as well as before the SPPF module for the method
of global attention on top of the model is verified. of this paper, respectively; and named G1, G2, G3, G4 in
Performance analysis of STC for small target detection the order of the experiments, respectively. the experimental
layer. Comparing the analysis of the data in Table 2 after results are shown in Table 3. Comparing the data in the
adding the STC module with the baseline model, it can table, it can be seen that the best experimental effect is
be found that the small target detection (STC) module the case that the attention mechanism is added in front of
has a more effective overall improvement. The improved the model SPPF structure in the paper, and after each layer
model has improved by 3.9% and 2.9% in accuracy and of GAM attention is added, the number of parameters is
recall, respectively. Because the network will pay more therefore increased. Still, the experimental accuracy shows
attention to small target recognition after adding the STC a small increase, but there is a slight decrease in the practical
module to connect the deep and shallow networks, it can accuracy when adding the two C2f layers of attention
effectively improve the algorithm’s detection performance for and one SPPF layer of attention compared to adding only
UAV-captured photos. Analyzing the experimental results, one layer of SPPF attention. The experiments show the
we know that the overall mAP improves by 4.1%, and optimal performance of the attention addition method in this
in the car category it improves by 5.1%. Combining paper.
the experimental results in Table 2, we think the model
improvement is effective.
3) COMPARISON EXPERIMENT
To verify the effectiveness of the algorithm in this paper
TABLE 3. Comparison of experimental results for multilayer attention on the dataset VisDrone2021, other classical state-of-the-art
addition. (SOTA) models were selected for comparison, including the
classical YOLO series network, the Anchor Free algorithm
CenterNet, the two-stage algorithm Faster-RCNN and some
of the current stage. The improved algorithms are evaluated
by the mAP values of each category accuracy and the
overall algorithm, and the comparison results are shown in
Table 4. The algorithm proposed in this paper had a more
effective reflection in detection accuracy than the current
Performance analysis of the combined modules. Because stage SOTA model, and the mAP values are improved by
of the improvement of each of the above modules, we com- 15.46%, 12.02%, and 8.19% compared with the classical
bined the global attention module and the small target YOLOv3, YOLOv4, and YOLOX models, respectively. The
detection module and compared the combined modules with mAP value of the popular algorithm TPH-YOLOv5 is
the baseline model. Among them, mAP, accuracy, and recall improved by 1.98%, in which the detection algorithm in this
are improved by 4.4%, 4.2%, and 2.8% on the baseline model, paper achieves 78.5% and 62.5% accuracy in the car and bus
respectively, and by 3.8%, 2.2%, and 2.9% on adding the categories, respectively, which far exceeds other models in
global attention module alone, and by 0.3% and 0.4% on the the same category. The improved algorithm model achieves
small target detection module. Compared with the previous 39.3% mAP on the dataset VisDrone2021, This indicates
experiments on separate modules, all accuracy increased, the network structure’s high performance for UAV target
successfully verifying that the combined module outperforms detection tasks.
V. CONCLUSION [10] J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, ‘‘Libra R-CNN:
In this paper, a UAV detection algorithm based on small Towards balanced learning for object detection,’’ in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 821–830.
target detection and attention mechanism was proposed to [11] J.-S. Lim, M. Astrid, H.-J. Yoon, and S.-I. Lee, ‘‘Small object detection
address the problem that the small size of the target in UAV using context and attention,’’ in Proc. Int. Conf. Artif. Intell. Inf. Commun.
inspection imaging leads to unsatisfactory feature extraction (ICAIIC), Apr. 2021, pp. 181–186.
and thus affects the detection effect of the algorithm on [12] Z. Feng, Z. Xie, Z. Bao, and K. Chen, ‘‘Real time dense small
target detection algorithm for UAV based on improved YOLOv5,’’ Acta
small targets. The algorithm first introduces a global attention Aeronaut. Sin, pp. 1–15, 2022.
GAM in the backbone network module Backbone to capture [13] X. Zhu, S. Lyu, X. Wang, and Q. Zhao, ‘‘TPH-YOLOv5: Improved
imaging features in multiple dimensions and improve the YOLOv5 based on transformer prediction head for object detection on
drone-captured scenarios,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis.
detection difficulty of small targets in UAV example scenes. Workshops (ICCVW), Oct. 2021, pp. 2778–2788.
The model then redeems the loss of semantic information [14] H. Fang, M. Xia, G. Zhou, Y. Chang, and L. Yan, ‘‘Infrared small UAV
on the way of sampling by adding the STC module to connect target detection based on residual image prediction via global and local
the network’s shallow and deep structure, fully captures dilated residual networks,’’ IEEE Geosci. Remote Sens. Lett., vol. 19,
pp. 1–5, 2022.
the global information and rich contextual information, and [15] H. Fang, L. Ding, L. Wang, Y. Chang, L. Yan, and J. Han, ‘‘Infrared small
effectively improves the algorithm’s detection effect on small UAV target detection based on depthwise separable residual dense network
targets. Experimental results show that the mAP, recall, and and multiscale feature fusion,’’ IEEE Trans. Instrum. Meas., vol. 71,
pp. 1–20, 2022.
accuracy of the algorithm in this paper are all improved
[16] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov,
over the baseline model, and the improved algorithm model R. Zemel, and Y. Bengio, ‘‘Show, attend and tell: Neural image caption
achieves 39.3% mAP on the dataset VisDrone2021. Although generation with visual attention,’’ in Proc. Int. Conf. Mach. Learn., 2015,
the method in this research has improved the detection of pp. 2048–2057.
[17] J. Hu, L. Shen, and G. Sun, ‘‘Squeeze-and-excitation networks,’’ in
small targets in UAV photography, due to the substantial Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
rise in the number of improved parameters, there is still pp. 7132–7141.
potential for improvement in the detection of false detections [18] J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, ‘‘Gather-excite:
and missed detections aimed at avoiding small targets. The Exploiting feature context in convolutional neural networks,’’ in Proc. Adv.
Neural Inf. Process. Syst., vol. 31, 2018, pp. 1–11.
next step is planned to study how to find a balance between [19] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, ‘‘CBAM: Convolutional
accuracy and lightness, to improve the detection performance block attention module,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
of small targets while ensuring the lightness of the model, pp. 3–19.
to better serve real-time industrial UAV inspection, and to [20] Q. Hou, D. Zhou, and J. Feng, ‘‘Coordinate attention for efficient mobile
network design,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
effectively respond to different scene transformations. (CVPR), Jun. 2021, pp. 13708–13717.
[21] A. G. Roy, N. Navab, and C. Wachinger, ‘‘Recalibrating fully convolutional
networks with spatial and channel ‘squeeze & excitation’ blocks,’’ IEEE
REFERENCES Trans. Med. Imag., vol. 38, no. 2, pp. 540–549, Feb. 2019.
[22] Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, ‘‘GCNet: Non-local networks
[1] X. Zihao, W. Hongyuan, Q. Pengyu, D. Weidong, Z. Ji, and C. Fuhua, meet squeeze-excitation networks and beyond,’’ in Proc. IEEE/CVF Int.
‘‘Printed surface defect detection model based on positive samples,’’ Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 1971–1980.
Comput., Mater. Continua, vol. 72, no. 3, pp. 5925–5938, 2022. [23] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, ‘‘ECA-Net:
[2] B. Jiang, R. Qu, Y. Li, and C. Li, ‘‘Survey of object detection in UAV Efficient channel attention for deep convolutional neural networks,’’ in
imagery based on deep learning,’’ Acta Aeronautica et Astronautica Sinica, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
vol. 42, no. 4, pp. 137–151, 2021. pp. 11531–11539.
[3] M. M. Fernandez-Carrobles, O. Deniz, and F. Maroto, ‘‘Gun and [24] H. Fang, Z. Liao, X. Wang, Y. Chang, and L. Yan, ‘‘Differentiated
knife detection based on faster R-CNN for video surveillance,’’ in attention guided network over hierarchical and aggregated features for
Pattern Recognition and Image Analysis. Madrid, Spain: Springer, 2019, intelligent UAV surveillance,’’ IEEE Trans. Ind. Informat., vol. 19, no. 9,
pp. 441–452. pp. 9909–9920, Sep. 2023.
[4] J. Cao, H. Cholakkal, R. M. Anwer, F. S. Khan, Y. Pang, and L. [25] Y. Wang, H. Wang, and Z. Xin, ‘‘Efficient detection model of steel
Shao, ‘‘D2Det: Towards high quality object detection and instance strip surface defects based on YOLO-V7,’’ IEEE Access, vol. 10,
segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. pp. 133936–133944, 2022.
(CVPR), Jun. 2020, pp. 11482–11491. [26] C.-Y. Wang, A. Bochkovskiy, and H.-Y. Mark Liao, ‘‘YOLOv7: Trainable
[5] U. Mittal, P. Chawla, and R. Tiwari, ‘‘EnsembleNet: A hybrid approach for bag-of-freebies sets new state-of-the-art for real-time object detectors,’’
vehicle detection and estimation of traffic density based on faster R-CNN 2022, arXiv:2207.02696.
and YOLO models,’’ Neural Comput. Appl., vol. 35, no. 6, pp. 4755–4774, [27] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, ‘‘YOLOX: Exceeding YOLO
Feb. 2023. series in 2021,’’ 2021, arXiv:2107.08430.
[6] P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, ‘‘A review of YOLO algorithm [28] A. Bochkovskiy, C.-Y. Wang, and H.-Y. Mark Liao, ‘‘YOLOv4: Optimal
developments,’’ Proc. Comput. Sci., vol. 199, pp. 1066–1073, Jan. 2022. speed and accuracy of object detection,’’ 2020, arXiv:2004.10934.
[7] K. R. Akshatha, A. K. Karunakar, S. B. Shenoy, A. K. Pai, N. H. Nagaraj, [29] B. Pu, Y. Lu, J. Chen, S. Li, N. Zhu, W. Wei, and K. Li, ‘‘MobileUNet-
and S. S. Rohatgi, ‘‘Human detection in aerial thermal images using FPN: A semantic segmentation model for fetal ultrasound four-chamber
faster R-CNN and SSD algorithms,’’ Electronics, vol. 11, no. 7, p. 1151, segmentation in edge computing environments,’’ IEEE J. Biomed. Health
Apr. 2022. Informat., vol. 26, no. 11, pp. 5540–5550, Nov. 2022.
[8] J.-H. Kim, N. Kim, and C. S. Won, ‘‘High-speed drone detection based on [30] G. Wan, H. Fang, D. Wang, J. Yan, and B. Xie, ‘‘Ceramic tile surface
YOLO-V8,’’ in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. defect detection based on deep learning,’’ Ceram. Int., vol. 48, no. 8,
(ICASSP), 2023, pp. 1–2. pp. 11085–11093, Apr. 2022.
[9] Q. Wang, H. Zhang, X. Hong, and Q. Zhou, ‘‘Small object detection based [31] Y. Liu, Z. Shao, and N. Hoffmann, ‘‘Global attention mechanism:
on modified FSSD and model compression,’’ in Proc. IEEE 6th Int. Conf. Retain information to enhance channel-spatial interactions,’’ 2021,
Signal Image Process. (ICSIP), Oct. 2021, pp. 88–92. arXiv:2112.05561.
[32] W. Xu, C. Zhang, Q. Wang, and P. Dai, ‘‘FEA-swin: Foreground ZHIYONG QIN received the B.E. degree from the
enhancement attention Swin transformer network for accurate UAV-based Huaide College, Changzhou University, in 2022,
dense object detection,’’ Sensors, vol. 22, no. 18, p. 6993, Sep. 2022. where he is currently pursuing the M.E. degree.
[33] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, His research interests include computer vision and
Z. Lin, N. Gimelshein, and L. Antiga, ‘‘PyTorch: An imperative style, object detection.
high-performance deep learning library,’’ in Proc. Adv. Neural Inf. Process.
Syst., vol. 32, 2019, pp. 1–12.
HONGYUAN WANG received the Ph.D. degree JIAYING TANG received the B.E. degree from
in computer science from the Nanjing University Tianjin Chengjian University, in 2022. She is cur-
of Science and Technology. He is currently a rently pursuing the M.E. degree with Changzhou
Professor, a Ph.D. Supervisor, and a Senior University. Her research interests include com-
Member of CCF. His research interests include puter vision and image segmentation.
computer vision, pattern recognition, and intelli-
gent systems.