UAV Based Target Tracking and Recognition
UAV Based Target Tracking and Recognition
Abstract In this paper, we develop a quadrotor UAV based utilization of object and GIS databases results in limited
target tracking and recognition system, which includes an identification and tracking performance.
intelligent gimbal sub-system for accurate camera positioning In [2], a set of algorithms are developed to detect and
and fast image processing. A set of robust consensus-based
algorithms are developed for objects tracking, in addition to track objects with pre-known shapes for UAVs. The scheme
moving background processing techniques. A neural network has been proved useful for tracking indoor targets below
learning based database is used to improve target recognition certain heights. In [3], [4], [5], GIS databases are utilized
performance. Moreover, a Geographic Information System for UAV navigation systems to generate tracking resulting
(GIS) is used to provide geo-location, environmental, and in terms of global coordinates. The experimental results
contextual information for the tracked objects. Experimental
and simulation results have demonstrated the robustness of the illustrate the improved tracking performance with the help
proposed target tracking and recognition framework. of GIS databases. [6] gives an state-of-the-art concensus-
based tracker which is capable of tracking deformable objects
I. INTRODUCTION in real time. A real-time object tracking and identification
In recent years, quadrotor UAV has become increasingly system with a convolutional neural network running on
widely used in both military or civil applications due to FPGA is proposed in [7]. The system performance is robust
its small size, low cost, high maneuverability, fast response to various environment and target conditions but only suitable
[1], so that it can replace the pilots and rescue team to for near-distance objects.
implement dangerous missions. Civil applications include In this paper, we propose a framework of real-time UAV
aerial crop surveys, inspection of power lines and pipelines, based objects tracking and recognition. Compared with most
forest fire detection and monitoring, search and rescue, tracking system described above, we develop a more com-
aerial photography, etc. For military applications, UAVs can plete and robust system with both GIS environment database
complete the complex tasks such as transporting materials, and neural network based object database. Besides, a set of
battlefield surveillance, border patrol, electronic warfare, etc. consensus-based algorithms are utilized for target tracking,
With the recent advances in MEMS, artificial intelligent, which show better performance than other algorithms in case
digital communications, and sensing technology, UAVs have of occlusion.
found more applications in many areas. Besides, computer The contributions of this paper include
vision techniques further enable intelligent UAV applications. 1) Design a UAV based target tracking and recognition
UAV based autonomous target identification and tracking system with an intelligent gimbal sub-system;
poses many technical challenges. First, as targets are moving 2) Develop a consensus-based visual tracking algorithm
within a changing background, it is important to develop real- to achieve robust target tracking, as well as image mo-
time algorithms for varying-background video processing. saicking techniques to deal with moving background;
Second, since the resolution of human facial images is 3) Utilize a GIS database to provide environmental and
usually low due to the far distances between the UAV and the contextual information, and improve tracking accuracy;
target, how to identify the targets with high accuracy among 4) Utilize a neural network based object database for fast
the low-resolution facial images becomes a key problem in feature matching, and high-accuracy target recognition.
real-time aerial surveillance for law enforcement agencies.
This paper is organized as follows. Section II describes the
Third, target occlusion by the changing background becomes
whole system architecture and states problems; Section III
very difficult to solve since it is hard to determine whether
presents the target tracking and recognition framework, in-
the target leaves the camera Field of View (FoV) or it is
cluding image mosaicking, consensus-based tracking algo-
occluded by other objects. Besides, algorithms without full
rithm, GIS database, and neural network based database;
*Supported by the Shenzhen Fundamental Research Program and South- Section IV provides the results and discussions. Section V
ern University of Science and Technology Research Committee. concludes the paper and outlines future work.
1 Tian Xiang, Fan Jiang, Gongjin Lan, Jiaming Sun,
Guocheng Liu and Qi Hao are with the Department of II. S YSTEM S ETUP AND P ROBLEM S TATEMENT
Computer Science and Engineering, Southern University of
Science and Technology, Shenzhen, Guangdong 518055, China A. UAV based Target Tracking and Recognition
[email protected], [email protected],
[email protected], [email protected], Fig. 1 illustrates the UAV based target tracking and recog-
[email protected], [email protected] nition system, which contains the UAV system (flight control
2 Cong Wang is with the school of automation science and engineering,
the South China University of Technology, Guangzhou, Guangdong 510641, + gimbal sub-system) and the ground station. The real-time
China [email protected] object tracking and recognition scheme is performed within
400
the UAV system, which implements a set of consensus-based UAV points the onboard camera onto the target and
algorithms for target tracking given video images, utilizes a continuously acquire the target images, after receiving
database for feature matching, as well as various sensors the instruction on the target initial position, texture, and
for controlling UAV flights and gimbal motions. On the size from the ground station; under the autonomous
other side, the ground station receives the video stream, the mode, the UAV automatically determines the targets of
position and orientation information of the quadrotor and interest, track their movements, continuously acquires
gimbal, and fuses them with the GIS database to achieving their images, and preprocesses the data, including
the accurate target geo-locations. white balance and de-noising.
2) Video mosaicking: to deal with the moving back-
B
grounds in the UAV videos, consecutive images within
/
a time window are aligned and stitched together into a
3 A C / panoramic image to get the static background image,
such that the targets can be localized more accurately
/
/
[9].
3) Target recognition: extract the feature points of the
images, and match those features with the target mod-
els stored in the database; if the matching is successful,
B then continue the target tracking, otherwise re-localize
the target from the images for subsequent processing.
/
4) Target tracking: implement the consensus-based al-
/
/
gorithm to perform the online target detection, local-
ization, tracking and prediction; estimate the motion
A B and the shape of the target continuously.
3 /
401
The hardware components of the gimbal system in- III. TARGET T RACKING AND R ECOGNITION
clude the core processor, various sensors (IMU, barometer, A. Image Mosaicking
GPS, camera), actuators (brushless motors) and wireless
transceiver, as shown in Fig. 3. In order to align the sequential frames, there must be com-
mon features within the captured images that can be matched
between the neighboring image frames. SIFT algorithm has
Barometer Gyroscope been used to implement the feature matching [10], which
Accelerator
is invariant to scale, orientation and affine distortion so that
IMU
it can produce a more precise matching between successive
Brushless
video frames.
GPS motor After finishing the feature matching between the succes-
sive frames, the extracted feature points will be used to
compute a geometric transformation matrix for warping the
Camera Wireless images so that they can be aligned with the previous frames.
transceiver
Transceiver
Finally, consecutive aligned images are stitched together into
a panoramic image. The process is described in Fig. 5.
402
severely disrupt the functions of subsequent video Target No
processing and target-following UAV flight control. To Recognition?
Yes
tackle this problem, an Extended Kalman Filter (EKF)- Transfer the video images and
based state observer is employed with the output of the geographical information
tracking algorithm. This ensures a steady and contin- Search in the GIS database
Since the position and orientation of UAVs camera are not Target 3D reconstruction
, as shown in Fig. 9.
Z
Fig. 6. Target global coordinates and static background ($%, '%, (%)
"
The ground station first receives a rough estimate of target
geo-location information from the UAV based on the GPS ($, ', () Y
and IMU readings, which can be used to estimate the camera 0
403
database to achieve long-distance and high-precision targets The target classifier is first trained on the ImageNet[11]
recognition performance for UAVs. The system setup is dataset, then fine-tuned on a pedestrian/vehicle dataset cap-
illustrated in Fig. 10. tured by ourselves. The proprietary dataset not only includes
various kinds of targets, but also takes into account a
GIS Localization variety of target poses and surroundings. The structure of the
recognition neural network includes five convolution layers
GIS
Camera NVIDIA TX1
and two fully connected layers. Meanwhile, a similar neural
network with three convolution layers is trained in parallel,
which trades for speed at the expense of accuracy.
Neural network
Sensor networks Neural network ground workstation Neural network on UAV
Targets recognition and tracking
Target
Target data set Neural network
update
Database Neural network
Fig. 10. Integrating GIS and neural network based databases Fig. 12. Offline and online training architecture of neural network
targets, and reduce the feature distances for the same target, Car
and update the target database, as shown in Fig. 11. F1 Car
Racing
Neural Network Convertible
Trailer
Human
IV. E XPERIMENT R ESULTS
Targets Base
404
designed to provide a platform for objects tracking and
recognition. Within our scheme, SIFT is used to extract
the image feature points and descriptors, and a robust
visual tracking algorithm using consensus-based temporal
learning is applied to realize the tracking process. Besides,
an Extended Kalman Filter (EKF)-based state observer is
employed to solve the target occlusion problem. The GIS
database has been integrated within the scheme to improve
the target localization and prediction accuracy, and provide
target global coordinates. The neural network target database
has been integrated to improve target recognition perfor-
mance. Experiment results demonstrate that the proposed
algorithm can achieve better performances than CMT in
Fig. 15. Comparison of the number of features tracked using the proposed
algorithm and CMT
various challenging situations.
ACKNOWLEDGMENT
in Fig. 14. We compared the performance of our algorithm This paper is partly supported by Southern University of
against CMT using two criteria, feature points tracked and Science and Technology Research Committee (No. FRG-
total frames of tracking loss. The result is shown in Fig. 15 SUSTC1501A-29 and FRG-SUSTC1501A-44). We are also
and Fig. 16. Total occlusion by land obstacles (trees, bridges, grateful to the precious help from our lab engineers Yunbo
etc.) is marked as red. Yang and Miaolin Hou.
R EFERENCES
[1] Islam, S., Liu, P.X., EI Saddik, A. Robust control of four-rotor
Unmanned Aerial Vehicle With Disturbance Uncertainty. IEEE Trans-
actions on Industrial Electronics,2015, 62(3),pp.1563-1571.
[2] K. Boudjit and C. Larbes, Detection and implementation autonomous
target tracking with a quadrotor AR.Drone, Informatics in Control,
Automation and Robotics (ICINCO), 2015 12th International Confer-
ence on (Volume:02 )., July 2015, pp. 223 230.
[3] Duo-Yu Gu, Cheng-Fei Zhu, Jiang Guo, Shu-Xiao Li, and Hong-
Xing Chang, Vision-sided UAV navigation using GIS data, Vehicular
Electronics and Safety (ICVES), 2010 IEEE International Conference
onJuly 2010, pp. 78 - 82.
[4] Cheng-Fei Zhu, Shu-Xiao Li, Hong-Xing Chang, Ji-Xiang Zhang,
Matching road networks extracted from aerial images to GIS data,
Information Processing, 2009. APCIP 2009. Asia-Pacific Conference
on (Volume:2 ) July 2009, pp. 63 66.
[5] Nathan Rackliffe, Holly A. Yanco, and Jennifer Casper, Using
Fig. 16. Comparison of the cumulative number of tracking losses between geographic information systems (GIS) for UAV landings and UGV
the proposed algorithm and CMT navigation, Technologies for Practical Robot Applications (TePRA),
2011 IEEE Conference onApril 2011, pp. 145 - 150.
[6] Nebehay, Georg, and Roman Pflugfelder. Clustering of static-adaptive
From Fig. 15, we can see that the number of feature points correspondences for deformable object tracking. In Proceedings of the
tracked of our algorithm is consistently far larger than that IEEE Conference on Computer Vision and Pattern Recognition, pp.
of CMT. The speed of recovery from tracking loss due to 2784-2791. 2015.
[7] Rohan Ghosh, Abhishek Mishra, Garrick Orchard, and Nitish V.
occulusions is also faster. The unstable number of features Thakor, Real-Time object recognition and orientation estimation
is due to the rapid update of the model, which still needs Using an event-based camera and CNN, Biomedical Circuits and
better tweaking in terms of learning rate. Systems Conference (BioCAS), 2014 IEEE., Oct. 2014, pp. 544 - 547.
[8] Meier, L., Tanskanen, P., Fraundorfer, F., and Pollefeys, M. Pixhawk:
Fig. 16 shows the total number of frames that do not have A system for autonomous flight using onboard computer vision.
a good target lock. Lower cumulative tracking loss indicates Robotics and automation (ICRA), 2011 IEEE international conference
that our algorithm has significantly lower probability of on. IEEE, 2011,pp. 2992-2997.
[9] Patil, Raju, Paul E. Rybski, Takeo Kanade, and Manuela M. Veloso.
losing track in partial occlusions by nearby vehicles and road People detection and tracking in high resolution panoramic video
signs in urban environments, proving its robustness. mosaic. InIntelligent Robots and Systems, 2004.(IROS 2004). Pro-
At the current stage, we are performing more experiments ceedings. 2004 IEEE/RSJ International Conference on, vol. 2, pp.
1323-1328. IEEE, 2004.
for human and vehicle tracking and recognition with the [10] Ke, Yan, and Rahul Sukthankar. PCA-SIFT: A more distinctive
proposed UAV-based system. More results will be obtained representation for local image descriptors. In Computer Vision and
in near future. With the help of GIS and neural network Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE
Computer Society Conference on, vol. 2, pp. II-506. IEEE, 2004.
databases, the system performance has been much improved. [11] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, Imagenet:
A large-scale hierarchical image database, in IEEE Conference on
V. CONCLUSIONS Computer Vision and Pattern Recognition, 2009, pp. 248-255.
This paper presents a framework for UAV based targets
tracking and recognition. The intelligent gimbal system is
405