0% found this document useful (0 votes)
6 views8 pages

Autonomous Unmanned Vehicle Automatic Visual Tracking Based On SLAM and YOLO Algorithm

This paper discusses the implementation of automatic visual tracking for autonomous unmanned aerial vehicles (UAVs) using SLAM and YOLO algorithms. The unmanned vehicle is designed to autonomously navigate, avoid obstacles, and track targets, enhancing the management of UAVs in a distribution center. The results demonstrate the vehicle's capabilities in mapping, guiding, and obstacle avoidance, fulfilling the requirements for transporting UAVs and interacting with human operators.

Uploaded by

dzdlut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

Autonomous Unmanned Vehicle Automatic Visual Tracking Based On SLAM and YOLO Algorithm

This paper discusses the implementation of automatic visual tracking for autonomous unmanned aerial vehicles (UAVs) using SLAM and YOLO algorithms. The unmanned vehicle is designed to autonomously navigate, avoid obstacles, and track targets, enhancing the management of UAVs in a distribution center. The results demonstrate the vehicle's capabilities in mapping, guiding, and obstacle avoidance, fulfilling the requirements for transporting UAVs and interacting with human operators.

Uploaded by

dzdlut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Autonomous Unmanned Vehicle Automatic

Visual Tracking Based on SLAM and YOLO


Algorithm

Xiaolei Qu1,2 , Jiaxing Wang1 , Xiulin Zhang1 , Xinyu Feng1 , Yang Du4 , Ke Li3(B) ,
and Lijing Wang3
1 Avic Shenyang Aircraft Design and Research Institute, Beijing, China
2 Northwestern Polytechnical University, Xian, China
3 School of Aeronautical Science and Engineering, Beihang University, Beijing, China
[email protected]
4 91001Army, Beijing, China

Abstract. In the UAV distribution center, the use of unmanned vehicles as the
UAV carrier platform can perfectly compensate for the short time that UAVs are
in the air, while saving manpower and material resources, autonomously com-
pleting navigation and obstacle avoidance in known environments, and at the
same time. With human-computer interaction function, it provides target track-
ing ability, which greatly facilitates the centralized and distributed management of
UAVs. In this paper, slam and yolo algorithms are utilized to implement automatic
visual tracking of autonomous unmanned aerial vehicles (UAVs) and to achieve
collaboration between UAVs and unmanned aerial vehicles (UAVs) in a known
experimental environment.

Keywords: Air-ground coordination · autonomous navigation and obstacle


avoidance · target tracking

1 Introduction
1.1 Visual Tracking History
Visual tracking started from statistical pattern recognition in the 1950s. At that time, it
mainly focused on the analysis and recognition of two-dimensional images. By the 1980s,
a relatively complete system had been formed. After that, many new methods of visual
tracking appeared, which can be divided into four categories-region-based tracking,
feature-based tracking, deformation template-based tracking, and model-based tracking
[5]. With the rapid development of machine learning, tracking models increasingly use
machine learning methods. In 1998, Isard regarded the visual tracking problem as a
non-linear problem, and introduced the particle filter method to propose the CONDEN-
SATION algorithm, which is the basis of the online update model [6]. In 2009, Mei
et al. combined particle filter framework and sparse representation and proposed L1T
[7]. The model uses an online update mechanism to capture the dynamic information

© Beijing KeCui Man-Machine-Environment System Engineering Technology Research Academy 2024


S. Long et al. (Eds.): MMESE 2024, LNEE 1256, pp. 524–531, 2024.
https://doi.org/10.1007/978-981-97-7139-4_73
Autonomous Unmanned Vehicle Automatic Visual Tracking 525

of the tracking target, which improves the robustness of the algorithm and the tracking
accuracy. In 2013, Wang et al. introduced the dictionary representation into the frame-
work, which can express the occlusion and complex background in the tracking process.
At the same time, the algorithm uses the Huber loss function to optimize the model,
which makes the algorithm more robust and accurate. Big improvement [8]. In 2011,
Sam et al. proposed the STRUCT algorithm [9]. The algorithm expresses the transition
of the target state between two frames of images by designing a structured output, and
learns the joint distribution of images and state transitions through the SVM algorithm.
In order to solve the model drift problem of the tracking algorithm, Zhang et al. proposed
the MEEM algorithm in 2014 [10]. The algorithm designs multiple expert trackers based
on the SVM algorithm, and selects appropriate experts to track and update the expert
model through specific criteria. In 2013, Zdenek et al. proposed the TLD algorithm [11].
The algorithm divides the entire visual tracking process into three processes: tracking,
updating, and detecting. Through the new P-N learning algorithm, the robustness and
accuracy of the model are fully guaranteed.
For the tracking problem, because there are many unknown factors, such as occlusion,
changes in lighting conditions, large deformation of the target, and similar interference
objects of the same type, for different situations, different discriminant features may be
selected to obtain a more accurate matching effect. Therefore, tracking models based
on integrated algorithms appear, such as the MEEM algorithm [10]. In 2019, Wang
et al. proposed the MCCT model to design multiple discriminators based on correlation
filtering through different feature combinations, and then select the most suitable dis-
criminator according to the corresponding selection criteria to achieve high-precision
tracking [21].

1.2 YOLO Algorithm


1.2.1 Principle of YOLO Algorithm
The YOLO algorithm uses the CNN designed for the target detection task for feature
extraction in the target detection process, and uses the fully connected layer to predict and
classify the position of the identified target. The network model of the YOLO algorithm
consists of an input layer, a convolutional layer, and pooling. Layer, fully connected
layer, as shown in Fig. 1.
The input layer of the YOLO algorithm cuts the input image into a three-channel
image of 448 * 448 * 3. The three-channel means that the image is composed of the three
primary colors of red, green and blue. The fully connected layer requires a fixed-size
vector as input, and requires a fixed size of the original image when feeding back data
to the front layer network, so the input image is cut to a fixed size of 448 * 448.
After the input layer is a 24-layer convolutional layer, the feature map is obtained by
convolution operation on the input image, and the feature data is recorded to facilitate
subsequent classification and target positioning. The YOLO network model uses 3 * 3
and 1 * 1 convolution kernels, of which the 1 * 1 convolution kernel is mainly used
to reduce the number of convolution kernel channels and reduce subsequent operating
parameters.
The pooling layer of the YOLO network model has the same function as the convolu-
tional neural network pooling layer, and a value is used to replace the corresponding area
526 X. Qu et al.

Fig. 1. YOLO network

pixel. The YOLO network model uses maximum pooling, replacing the original image
area with the maximum value of the image area after convolution, reducing redundant
data and preventing overfitting.
The output layer of the last layer of the YOLO network model is similar to the
SoftMax classifier in the CNN algorithm. The classification output of the fully connected
layer data is similar. The number of output feature maps is the number of classifications
of the target, but there are also different YOLO algorithm output layers that output a 7*
A 7 * 30 tensor, 7 * 7 corresponds to the 7 * 7 grid of the input layer, and 30 represents
the classification result and position information encoding of the object in the image.
Finally, the vector is decoded through a unified agreement to draw the detection result
in the original image.

1.2.2 Inspection Process


The YOLO algorithm divides the input image into S * S grids, and each grid is responsible
for detecting the target falling on the center point and obtaining the target frame. Each
target frame is composed of five-dimensional prediction parameters, namely center point
offset (x, y), width and height (w, h) and confidence (Confidence).
The confidence is calculated as shown in Formula (1)

C = Pr(Object) ∗ IoU (1)

In formula (1), Pr (Object) represents the possibility of the existence of the target
in the grid target frame, Object represents the target object, and IoU (Intersection over
Union) is used to display the position of the target frame predicted by the current model
Accuracy, the expression is shown in formula (2).

box(pre) ∩ box(true)
pre =
IoU true (2)
box(pre) ∪ box(true)
In formula (2), box(pre) represents the predicted target frame, and box(true)
represents the real target frame.
Autonomous Unmanned Vehicle Automatic Visual Tracking 527

2 Results

2.1 Manual Control


The unmanned car can be remotely controlled through multiple methods, including APP
remote control, PS2 wired remote control, model airplane remote control, as shown in
Fig. 2. These approaches provide manual operation. And the reaction time is acceptable.

Fig. 2. Prototype

2.2 Radar Mapping

As shown in Fig. 3, the unmanned car is able to create a 2D map with the laser radar.
However, in our team target, the cars are running in the known field. Thus, the laser radar
will mainly be used in avoidance rather than mapping.

2.3 Guiding and Avoidance

In Fig. 4, the unmanned car is required to move from right to the left. And the map is
given in advance. As we can see, the car comes across two obstacles, one being square
while the other being round. The car moves fluently and pass through the obstacles, and
reach the destination successfully.
In Fig. 5, the car comes across a moving obstacle (human). It keeps a safe distance
with human well, and the reaction of re-planning is quick enough.
528 X. Qu et al.

Fig. 3. 3D Mapping test

Fig. 4. Guiding showcase

Fig. 5. Showcase of moving obstacle


Autonomous Unmanned Vehicle Automatic Visual Tracking 529

3 Conclusion
The intelligent unmanned car is equipped with depth camera and liser radar, which
can definitely meet the demand in our team project. The prototype is challenged with
multiple tasks including mapping, guiding, avoidance and visual following, proving that
this technology serves well in the mission of transporting the UAVs and the interaction
with human operators.
According to the decomposition of the task process of the UAV distribution cen-
ter, the unmanned vehicle needs to participate in three links: transport the UAV out of
the warehouse to the take-off site, retrieve the UAV at the landing site and transport it
back to the designated location (warehouse), and the recipient’s Intervene in command.
From the results, our unmanned vehicle meets the demands of autonomous naviga-
tion functions and realize autonomous path planning based on known map information.
Our unmanned vehicle also achieves autonomous obstacle avoidance functions, with
awareness of the surrounding environment, to avoid moving objects and plan routes rea-
sonably. Our unmanned vehicle has human-targeted visual tracking function to improve
the performance of human-computer interaction.

Acknowledgements. This work are supported by the Chinese National Natural Science Founda-
tion (No. 61773039), the Aeronautical Science Foundation of China (No. 2017ZDXX1043), and
Aeronautical Science Foundation of China (No. 2018XXX).

References
1. Guilmartin, J.F.: Unmanned aerial vehicle. Encyclopedia Britannica, 15 July 2020. https://
www.britannica.com/technology/unmanned-aerial-vehicle. Accessed 15 May 2021
2. Dean, J., Mixter, J., Barr, J.: Multi-Rotor Unmanned Aerial Vehicle. Western Michigan Univer-
sity, 12 August 2015. https://scholarworks.wmich.edu/cgi/viewcontent.cgi?article=3658&
context=honors_theses
3. https://www.droneomega.com/what-is-a-quadcopter/
4. Lukmana, M.A., Nurhadi, H.: Preliminary study on Unmanned Aerial Vehicle (UAV) Quad-
copter using PID controller. In: 2015 International Conference on Advanced Mechatronics,
Intelligent Manufacture, and Industrial Automation (ICAMIMIA), pp. 34–37 (2015). https://
doi.org/10.1109/ICAMIMIA.2015.7507997
5. . (1), 72–74
(2014)
6. Isard, M., Blake, A.: Condensation—conditional density propagation for visual tracking. Int.
J. Comput. Vis. 29(1), 5–28 (1998)
7. Mei, X., Ling, H.: Robust visual tracking using 1minimization. In: IEEE International
Conference on Computer Vision. IEEE (2010)
8. Wang, N., Wang, J., Yeung, D.: Online robust non-negative dictionary learning for visual
tracking. In: IEEE International Conference on Computer Vision. IEEE (2013)
9. Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: IEEE
International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 6–13 November
2011. IEEE (2011)
10. Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy
minimization (2014)
530 X. Qu et al.

11. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Softw. Eng.
34(7), 1409–1422 (2011)
12. Bolme, DS., Beveridge, J.R., Draper, B.A., et al.: Visual object tracking using adaptive
correlation filters. In: CVPR, pp. 2544–2550 (2010)
13. Henriques, J.F., Caseiro, R., Martins, P., et al.: High-speed tracking with kernelized correlation
filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
14. Danelljan, M., Hger, G., Khan, F.S., et al.: Learning spatially regularized correlation filters
for visual tracking (2016)
15. Li, F., Tian, C., Zuo, W., et al.: Learning spatial-temporal regularized correlation filters for
visual tracking (2018)
16. Danelljan, M., Hager, G., Khan, F.S., et al.: Convolutional features for correlation filter based
visual tracking. In: 2015 IEEE International Conference on Computer Vision Workshop
(ICCVW). IEEE (2015)
17. Lukezic, A., Vojir, T., Zajc, L.C., et al.: Discriminative correlation filter with channel and
spatial reliability. In: The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 6309–6318 (2017)
18. Sun, C., Wang, D., Lu, H., et al.: Learning spatial-aware regressions for visual tracking (2017)
19. Danelljan, M., Robinson, A., Khan, F.S., et al.: Beyond correlation filters: learning continuous
convolution operators for visual tracking (2016)
20. Danelljan, M., Bhat, G., Khan, F.S., et al.: ECO: efficient convolution operators for tracking
(2016)
21. Wang, N., Zhou, W., Tian, Q., et al.: Multi-cue correlation filters for robust visual tracking.
In: 2018 IEEE/Conference on Computer Vision and Pattern Recognition. IEEE (2018)
22. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking
(2015)
23. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks.
In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–
765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45
24. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional
siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol.
9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
25. Li, B., Yan, J., Wu, W., et al.: High performance visual tracking with siamese region pro-
posal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR). IEEE (2018)
26. Wang, Q., Zhang, L., Bertinetto, L.: Fast online object tracking and segmentation: a unifying
approach. In: IEEE/CVF Conference on Computer Vision & Pattern Recognition. IEEE (2019)
27. Li, B., Wu, W., Wang, Q., et al.: SiamRPN++: evolution of siamese visual tracking with very
deep networks. In: IEEE/CVF Conference on Computer Vision & Pattern Recognition. IEEE
(2019)
28. Guo, Q., Feng, W., Zhou, C., et al.: Learning dynamic siamese network for visual object
tracking. In: International Conference on Computer Vision (ICCV 2017). IEEE Computer
Society (2017)
29. Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Ferrari, V.,
Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 153–169.
Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_10
30. Zhu, Z., Wang, Q., Li, B., et al.: Distractor-aware siamese networks for visual object tracking
(2018)
31. Danelljan, M., Bhat, G., Khan, F.S., et al.: ATOM: accurate tracking by overlap maximization.
In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE
(2019)
Autonomous Unmanned Vehicle Automatic Visual Tracking 531

32. Bhat, G., Danelljan, M., Gool, L.V., et al.: Learning discriminative model prediction for
tracking. In: ICCV (2019)
33. : .
(2019)
34. Bolme, D.S., Beveridge, J.R., Draper, B.A., et al.: Visual object tracking using adaptive
correlation filters. In: Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition, San Francisco, USA, pp. 2544–2550. IEEE (2010)

You might also like