Air-to-Air Visual Detection of Micro-UAVs An Experimental Evaluation of Deep Learning
Air-to-Air Visual Detection of Micro-UAVs An Experimental Evaluation of Deep Learning
2, APRIL 2021
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 03,2021 at 08:05:09 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: AIR-TO-AIR VISUAL DETECTION OF MICRO-UAVs 1021
Fig. 2. Samples of images in the dataset and the corresponding detection results by the eight algorithms. The dataset contains four types of background scenes:
sky, mountain, field, and urban. The detected areas by the eight algorithms are given right of each sample image with color-coded boxes. If the corresponding area
is blank, it means that the algorithm does not detect any target UAV in this image.
small (e.g., less than 10 × 10 pixels), which thus increases the Although deep learning methods have exhibited superior
difficulty of detection. performance in many object detection tasks, their potential for
The existing approaches for UAV detection could be classified UAV detection has not been well explored or evaluated up to
into two streams. The first stream is the conventional approaches now (see Section II-B for a review). As the first step towards
that are composed of two-step operations. The first step is to establishing a robust approach to air-to-air UAV detection, this
extract object features represented by, for example, Histogram of paper proposes a new dataset of micro UAV images and presents
Oriented Gradients (HOG) or Scale Invariant Feature Transform a comprehensive experimental evaluation of eight representative
(SIFT). The second step is to classify the features using machine- deep-learning algorithms. It is worth noting that we focus on
learning algorithms such as Support Vector Machine (SVM) the case where the target UAVs are known in advance such
or Adaboost. The second stream is the deep-learning-based that a dataset of them could be built up for the purpose of
approaches, which directly outputs detection results using end- training. This case applies to tasks like vision-based coopera-
to-end artificial neural networks. In contrast to the conventional tive control multi-UAV systems, which is our main motivation
approaches, which use hand-craft features, deep-learning-based for UAV detection. Although the algorithms exhibit a certain
approaches rely on deep convolutional neural network (DCNN) generalization ability to detect unknown UAVs with similar
features and consequently have a stronger capability to represent appearances, other measures such as building up datasets of
complex objects. However, the disadvantage of using DCNN is multiple types of UAVs or target motion sensing [2] may be
that it has high computational requirements and it requires large required.
datasets to train. A detailed review of the existing approaches is The novelty and contribution of this work are detailed as
given in Section II. follows.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 03,2021 at 08:05:09 UTC from IEEE Xplore. Restrictions apply.
1022 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 6, NO. 2, APRIL 2021
First, this paper presents a dataset of 13 271 images of a flying detection accuracy drops rapidly in the case of partial occlusion.
target UAV (DJI Mavic) acquired by another flying UAV (DJI Motivated by moving object detection in see-and-avoid tasks,
M210). Compared to the existing air-to-air datasets, the pro- the work in [19] utilizes optical flow matching to integrate
posed one is more systematically designed and comprehensive spatial and temporal information to track moving targets. This
in the sense that it covers a wide range of practical scenarios approach requires high-precision motion compensation. The
with different background scenes, viewing angles, relative dis- optical flow method is also used to locate moving objects in [20].
tance, flying altitude, and lightning conditions. In particular, the The consequent step is to recognize the moving objects by
environmental background scenes vary from simple ones such template matching, which is not robust to variance in the target
as clear sky to complex ones such as mountain, field, and urban. appearance. The work in [21] also adopts template matching as
The relative distance of the target UAV varies from 10 m to well as morphological filtering for UAV detection. A real-time
100 m, and the flight altitude from 20 m to 110 m. Since lightning detection and tracking strategy is proposed in [22] where the
conditions are also important factors in flying UAV detection, object of interest can be automatically detected in a saliency
the time for data collection varies from morning to evening map by computing background connectivity cue at each frame.
in different periods of the day. The dataset also covers some The work in [23] proposes a pyramidal Lucas-Kanade (PLK)
challenging scenarios with, for example, strong light, motion algorithm to detect motion targets in a team of cooperative
blur, and partial target occlusion. UAVs. The work in [24] detects moving target by extracting
Second, this paper presents an experimental evaluation of geometry features and dynamic features in the segmentation
eight representative deep-learning algorithms based on our pro- image, and classifies them by discriminant function derived from
posed dataset: RetinaNet [8], SSD [9], YOLOv3 [10], FPN [11], the Bayesian theorem.
Faster R-CNN [12], RefineDet [13], Grid R-CNN [14], and In summary, although UAV detection has been studied based
Cascade R-CNN [15]. To the best of our knowledge, this is on many conventional approaches, these approaches are ef-
the first comprehensive evaluation of deep learning algorithms fective only in restricted scenarios where, for example, the
for UAV detection tasks. The evaluation results suggest that the background scene is relatively simple or the target appearance
overall performance of Cascade R-CNN and Grid R-CNN is does not vary considerably.
superior compared to the others. We also evaluated the impact of
some key factors such as background scene complexity, relative B. Deep-Learning Approaches
viewing angles, and target scales on the detection performance.
Although the methods based on deep learning have made great
The proposed dataset could be used as a benchmark to eval-
progress in the field of general object detection, they have not
uate different UAV detection algorithms (either conventional
been well explored in the field of UAV detection. Up to now,
or deep-learning-based). The evaluation results highlight some
there are only few studies on visual detecting UAVs by deep
key challenges in the problem of air-to-air UAV detection and
learning algorithms. For example, an approach to detect flying
suggest potential ways to develop new algorithms in the future.
objects using motion compensation is proposed in [25], where
the features of moving objects are classified by CNNs. This
II. RELATED WORK
approach leads to high average detection precision, whereas
This section gives a review of the existing studies on visual the motion compensation step requires high-precision measure-
detection of micro UAVs. We only consider the case of using ment of the motion of the camera. The work in [26] combines
monocular cameras. SegNet with bottom-hat morphological processing for detecting
large-size aircraft in the air. This approach could detect aircraft
A. Conventional Approaches within a long-range up to 2800 m, but the accuracy is as low as
13.4%. Although some other studies such as [27], [28] also adopt
The conventional techniques adopted by existing UAV detec-
deep learning algorithms such as YOLOv2 to detect UAVs, the
tion works can be classified into two categories. The first is to
performance of different representative deep learning algorithms
use feature extraction methods to obtain target features, and then
for UAV detection have not been evaluated or compared.
use a discriminant classifier to determine the target location. The
second is to detect moving objects in the image, and then use a
generative classifier to determine whether the moving object is C. Existing Datasets for UAV Detection
the target. Up to now, there are very few comprehensive datasets for the
In particular, the work in [16] adopts Haar wavelet based purpose of training deep learning algorithms for UAV detection.
AdaBoost to detect UAVs. The approach is demonstrated by The dataset in [29] comprises 20 video sequences and each of
flight experiments to be effective in the simple case of the them has about 4000 752 × 480 gray frames. The image of the
cloudy sky background. The work in [17] proposes a cascade flying target UAV is captured by a camera mounted on another
approach to detect UAVs based on Haar-like features, local UAV in indoor and outdoor environments. The dataset proposed
binary patterns, and HOG. Since it is a combination of different in [30] consists of two sub-datasets. The first is a Public-Domain
detection methods, this approach has a low running speed. HOG drone dataset that contains 30 video sequences with different
feature is adopted in [18] for training classical cascade detectors. drone models captured in indoor and outdoor environments. The
Although this approach significantly reduces the number of other one is the USC drone dataset that contains 30 video clips of
repeated detections by applying non-maximum suppression, the the same target UAV. This dataset is acquired on the USC campus
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 03,2021 at 08:05:09 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: AIR-TO-AIR VISUAL DETECTION OF MICRO-UAVs 1023
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 03,2021 at 08:05:09 UTC from IEEE Xplore. Restrictions apply.
1024 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 6, NO. 2, APRIL 2021
TABLE I
THE HYPERPARAMETERS IN OUR IMPLEMENTATION OF THE EIGHT ALGORITHMS
[34], we choose DarkNet-53 for YOLOv3 in our experiments. a comparison between areas. Thus, we use AP as the evaluation
The original optimizers are used. The learning rate (LR), mo- metrics.
mentum, weight decay, and iteration are finely tuned based on
extensive tests. V. EVALUATIONS RESULTS
Our experiments are implemented on a computer with an Intel
i7, 32 GB RAM, Nvidia RTX 2080Ti rather than an embedded A. Average Precision
computer in order to reduce training time. We train the models The APs of the eight algorithms are shown in Table II.
based on 70% of the images, in which 10% is evaluated for Grid R-CNN achieves the best performance (82.4%) among
validation, and test them based on the remaining 30% images. all detectors, while RefineDet the worst (69.5%). Among two-
In addition, we use non-maximum suppression (NMS) to remove stage networks, Cascade R-CNN achieves the best performance
overlapping bounding boxes, so that an object is only contained (79.4%), whereas Faster R-CNN, which is the main framework
in one bounding box. As an important parameter in NMS to of two-stage networks in our experiments, achieves the worst
evaluate the overlapping rate of predicted bounding boxes, IoU (70.5%). For one-stage networks, SSD512 (78.7%) and Reti-
is defined as naNet (77.9%) both perform well, whereas YOLOv3 achieves
area(Op ∩ Ogt ) only 72.3%. Although one-stage networks sacrifice detection
Ao = .
area(Op ∪ Ogt ) performance to obtain high implementation efficiency, SSD512
achieves the same AP as FPN, which suggests that SSD512 could
In our experiments, the IoU threshold is set to 0.5. be a good alternative for tasks requiring high computational
In the training stage, we set the training epoch as eight and efficiency.
save model parameters in each epoch. If the training loss and To further evaluate the performance of the algorithms, we
validation loss remain stable we conclude that the detector is split the testing set into two sets. One set, named Det-Fly-
well trained. Otherwise, we modify the setting epoch and resume Simple, contains images with a relatively simple background
training until the model is well trained. (e.g., clean sky), short sensor-target range, and low flight speed.
Precision is a metric to evaluate missing detection. The calcu- The other set, named Det-Fly-Complex, consists of a more
lation of Precision in this paper is the same as the ones in general complex background (e.g., complex urban) and small target size.
visual object detection, which traverses all predicted boxes to Both datasets contribute about 50% images of the entire dataset.
calculate Precision. If the UAV is successfully detected, then the The evaluation results on Det-Fly-Simple suggest that the two-
predicted bounding box will be regarded as true positive (TP). stage networks, Cascade R-CNN and Grid R-CNN, achieve the
Otherwise, it will be regarded as a false positive (FP). Precision highest AP (more than 82.0%) among all the eight networks.
is defined as Among one-stage networks, RetinaNet and SSD512 achieve
TP the best performance (nearly 81.0%). Except for RefineDet and
Precision = .
TP + FP YOLOv3, the performance of other algorithms is higher than
Recall is a metric to measure false detection and defined as 80.0%. Compared with Det-Fly-Simple, the detection perfor-
mance of most of the algorithms on Det-Fly-Complex drops
TP
Recall = . sharply by nearly an average of 5.0%, due to the high complexity
TP + FN of Det-Fly-Complex. The mean Precision of the algorithms could
The performance of an object detector can be evaluated by only achieve 74.4%. In particular, Grid R-CNN still achieves
Precision × Recall (P-R) curve, which considers false detec- the best performance and it is also the only one exceed 80.0%.
tions with respect to missing detections for varying thresholds. RetinaNet and SSD512, which have similar performance, still
However, P-R curves are often zigzag curves going up and down perform best within one-stage networks. In general, two-stage
and tend to cross each other frequently, it is usually not easy networks perform a little better than one-stage networks in this
to compare different curves (different detectors) in the same test.
plot. Instead, numerical metrics called Average Precision (AP) In summary, Grid R-CNN and Cascade R-CNN show
can help us compare different detectors. AP is the area under a stable and superior performance compared to the others in all
curve (AUC) of the Precision × Recall curve. It is easy to make evaluation scenarios. One stage networks, SSD512 and
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 03,2021 at 08:05:09 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: AIR-TO-AIR VISUAL DETECTION OF MICRO-UAVs 1025
TABLE II
THE AP OF THE EIGHT ALGORITHMS TESTED ON DET-FLY (%)
TABLE III
THE AP FOR DIFFERENT ENVIRONMENTAL BACKGROUND SCENES (%)
TABLE IV
Fig. 4. The inference time of all algorithms in our experiment.
THE AP FOR DIFFERENT CHALLENGING CONDITIONS (%)
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 03,2021 at 08:05:09 UTC from IEEE Xplore. Restrictions apply.
1026 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 6, NO. 2, APRIL 2021
TABLE V
THE AP OF THE EIGHT ALGORITHMS TESTED ON MIDGARD (%)
Fig. 5. The AP of the algorithms for different target scales. If both the width
and height of the annotated bounding box are, respectively, smaller than x (x ∈
{1/40, 1/20, 1/10}) of the width and height of the entire image, then it is
classified as < x[W, H]. The AP is calculated by the algorithms with data in Fig. 6. The AP of different viewing angles. This figure is divided into three
the internals. The mAP represents the mean AP of the eight algorithms in each parts which are Top (top view), Fro (front view), and Bot (bottom view). The
scale interval. vertical axis of each part, which is the AP of the algorithms, is from 0.5 to 1.0. The
mAP of each part is about 0.78 (Top), 0.72 (Fro), and 0.85 (Bot), respectively.
The marker in each part represents the performance of the algorithm.
with our intuition that the complex urban background makes part of the target UAV is out of the field of view. All the images
visual UAV detection very challenging. in these cases can be found online in our dataset.
As for the performance of algorithms, Grid R-CNN shows The testing results of the eight algorithms under the three
consistent and high Precision across different types of back- challenging conditions are reported in Table IV. It is notable
ground scenes, whereas the performance of Faster R-CNN and that partial occlusion causes much lower AP. Part of the reason
RefineDet drops rapidly when the background complexity in- is that partially occluded target detection is indeed a challenging
creases. task, and in the meantime, the images of this case only occupy a
2) Target Scales: The size of the target UAV in the image small proportion of the dataset. On the other hand, strong/weak
has a great impact on detection performance. Fig. 5 shows the lighting conditions and motion blur do not compromise the
APs of all the algorithms with respect to the target size/ratio. As performance significantly, which verifies the robustness of the
shown in the figure, the APs of all algorithms increase at different deep learning algorithms.
rates when the target scale increases. In particular, Grid R-CNN
shows the best performance for different target scales, whereas
the performance of RefineDet and Faster R-CNN drops rapidly D. Comparison With the State-of-the-Art Dataset
when the target scale becomes small. To the best of our knowledge, MIDGARD is the latest compre-
3) Viewing Angles: It is noticed from our experiment that hensive dataset designed for deep-learning-based micro-UAV
the viewing angles of the target UAV also has an impact on detection [32]. Compared to MIDGARD, the annotation bound-
the detection performance. Fig. 6 shows the AP for different ing box of each image in Det-Fly is tighter, because the images
viewing angles. It can be seen that the bottom view leads to in Det-Fly are annotated one by one manually by professionals,
the highest Precision, whereas the front view is the lowest. The whereas the images in MIDGARD are automatically annotated
reason is that, for the bottom-view cases, the target shows rich based on UVDAR and relative pose estimation. Moreover, Det-
geometric information, and in the meantime, the background Fly covers a wider range of relative target distances. In particular,
scene is a blue or cloudy sky. However, for the front-view cases, the longest relative target distance in Det-Fly reaches more than
the target is flat and hence shows less geometric information, 100 m, but the longest distance in MIDGARD is less than 20 m.
and in the meantime, the background could be more complex Due to the wide range of relative distances, the scale of the target
than the bottom view case. UAV in Det-Fly is more diverse.
4) Other Challenging Conditions: The dataset covers some The eight algorithms have been trained and tested on
challenging conditions such as strong/weak lighting, motion MIDGARD. The testing results are shown in Table V. As can
blur, and partial occlusion. The ratios of the images of the be seen, the results of MIDGARD are 10% better than that of
three scenarios in our dataset are 10.8%, 11.2%, and 0.8%, Det-Fly. This might be caused by the complexity and diversity
respectively. Here, partial occlusion refers to the case where of the samples in Det-Fly.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 03,2021 at 08:05:09 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: AIR-TO-AIR VISUAL DETECTION OF MICRO-UAVs 1027
VI. CONCLUSION [13] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, “Single-shot refinement
neural network for object detection,” in Proc. IEEE/CVF Conf. Comput.
This letter presented a new dataset, named Det-Fly, for Vis. Pattern Recognit., 2018, pp. 4203–4212.
air-to-air UAV detection and evaluated eight representative [14] X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, “Grid R-CNN,” in Proc. IEEE/CVF
Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7363–7372.
deep-learning algorithms based on this dataset. Not only the [15] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality
overall performance of the algorithms are carefully evaluated object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog-
and compared, the impact of environmental background, target nit., 2018, pp. 6154–6162.
[16] F. Lin, K. Peng, X. Dong, S. Zhao, and B. M. Chen, “Vision-based
scales, viewing angles, and other challenging conditions on formation for UAVs,” in Proc. IEEE Int. Conf. Control Automat., 2014,
the detection performance is also analyzed. According to the pp. 1375–1380.
experimental results, suggestions on how to design algorithms [17] F. Gökçe, G. Üçoluk, E. Şahin, and S. Kalkan, “Vision-based detection and
distance estimation of micro unmanned aerial vehicles,” Sensors, vol. 15,
to achieve better detecting performance in the future are given. no. 9, pp. 23805–23846, 2015.
In the future, to detect unknown UAVs in various environ- [18] K. R. Sapkota et al., “Vision-based unmanned aerial vehicle detection and
ments, the dataset should be further enhanced by adding more tracking for sense and avoid systems,” in Proc. IEEE/RSJ Int. Conf. Intell.
Robot. Syst., 2016, pp. 1556–1561.
types of UAVs and background scenarios. Moreover, an ablation [19] J. Li, D. H. Ye, T. Chung, M. Kolsch, J. Wachs, and C. Bouman, “Multi-
study is necessary to design deep-learning algorithms that are target detection and tracking from a single camera in unmanned aerial
specifically suitable for UAV detection tasks and to be imple- vehicles (UAVs),” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2016,
pp. 4992–4997.
mented onboard. In addition, interpretable technology may be [20] S. Minaeian, J. Liu, and Y.-J. Son, “Effective and efficient detection of
adopted to explain why the recommended network structures moving targets from a UAV’s camera,” IEEE Trans. Intell. Transp. Syst.,
or methods could improve detection performance. Algorithms vol. 19, no. 2, pp. 497–506, Feb. 2018.
[21] R. Opromolla, G. Fasano, and D. Accardo, “A vision-based approach to
that are able to process high-resolution images also need more UAV detection and tracking in cooperative applications,” Sensors, vol. 18,
attention. no. 10, 2018.
[22] Y. Wu, Y. Sui, and G. Wang, “Vision-based real-time aerial object lo-
calization and tracking for UAV sensing system,” IEEE Access, vol. 5,
REFERENCES pp. 23 969–23 978, 2017.
[1] Y. Tang et al., “Vision-aided multi-UAV autonomous flocking in GPS- [23] S. Minaeian, J. Liu, and Y. Son, “Vision-based target detection and lo-
denied environment,” IEEE Trans. Ind. Electron., vol. 66, no. 1, calization via a team of cooperative UAV and UGVs,” IEEE Trans. Syst.,
pp. 616–626, Jan. 2019. Man, Cybern. Syst., vol. 46, no. 7, pp. 1005–1016, Jul. 2016.
[2] J. Xie, J. Yu, J. Wu, Z. Shi, and J. Chen, “Adaptive switching spatial- [24] F. Lin, X. Dong, B. M. Chen, K. Lum, and T. H. Lee, “A robust real-
temporal fusion detection for remote flying drones,” IEEE Trans. Veh. time embedded vision system on an unmanned rotorcraft for ground target
Technol., vol. 69, no. 7, pp. 6964–6976, Jul. 2020. following,” IEEE Trans. Ind. Electron., vol. 59, no. 2, pp. 1038–1049,
[3] R. Mitchell and I. Chen, “Adaptive intrusion detection of malicious un- Feb. 2012.
manned air vehicles using behavior rule specifications,” IEEE Trans. Syst., [25] A. Rozantsev, V. Lepetit, and P. Fua, “Detecting flying objects using a
Man, Cybern. Syst., vol. 44, no. 5, pp. 593–604, May 2014. single moving camera,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39,
[4] J. Zhang, C. Hu, R. G. Chadha, and S. Singh, “Maximum likeli- no. 5, pp. 879–892, May 2017.
hood path planning for fast aerial maneuvers and collision avoid- [26] J. James, J. J. Ford, and T. L. Molloy, “Learning to detect aircraft for
ance,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2019, long-range vision-based sense-and-avoid systems,” IEEE Robot. Automat.
pp. 2805–2812. Lett., vol. 3, no. 4, pp. 4383–4390, Oct. 2018.
[5] J. Ren and X. Jiang, “Regularized 2-d complex-log spectral analysis [27] C. Aker and S. Kalkan, “Using deep networks for drone detection,” in
and subspace reliability analysis of micro-doppler signature for UAV Proc. IEEE Int. Conf. Adv. Video Signal Based Surveill., 2017, pp. 1–6.
detection,” Pattern Recognit., vol. 69, pp. 225–237, 2017. [28] A. Schumann, L. Sommer, J. Klatte, T. Schuchert, and J. Beyerer, “Deep
[6] A. Bernardini, F. Mangiatordi, E. Pallotti, and L. Capodiferro, “Drone cross-domain flying object classification for robust UAV detection,” in
detection by acoustic signature identification,” Electron. Imag., vol. 2017, Proc. IEEE Int. Conf. Adv. Video Signal Based Surveill., 2017, pp. 1–6.
no. 10, pp. 60–64, 2017. [29] A. Rozantsev, V. Lepetit, and P. Fua, “Flying objects detection from a
[7] R. Yoshihashi, T. T. Trinh, R. Kawakami, S. You, M. Iida, and T. single moving camera,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Naemura, “Differentiating objects by motion: Joint detection and track- Recognit., 2015, pp. 4128–4136.
ing of small flying objects,” Comput. Vis. Pattern Recognit., 2017, [30] Y. Chen, P. Aggarwal, J. Choi, and C. C. J. Kuo, “A deep learning approach
arXiv:1709.04666. to drone monitoring,” in Proc. Asia-Pacific Signal Inf. Process. Assoc.
[8] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for Annu. Summit Conf., 2017, pp. 686–691.
dense object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2017, [31] X. Peng, B. Sun, K. Ali, and K. Saenko, “Learning deep object detectors
pp. 2980–2988. from 3D models,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1278–
[9] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. 1286.
Comput. Vis., 2016, pp. 21–37. [32] V. Walter, M. Vrba, and M. Saska, “On training datasets for machine
[10] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” learning-based visual relative localization of micro-scale UAVs,” in Proc.
2018, arXiv:1804.02767. IEEE Int. Conf. Robot. Automat., 2020, pp. 10 674–10 680.
[11] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, [33] V. Walter, M. Saska, and A. Franchi, “Fast mutual relative localization of
“Feature pyramid networks for object detection,” in Proc. IEEE Conf. UAVs using ultraviolet led markers,” in Proc. Int. Conf. Unmanned Aircr.
Comput. Vis. Pattern Recognit., 2017, pp. 936–944. Syst., 2018, pp. 1217–1226.
[12] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time [34] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
object detection with region proposal networks,” IEEE Trans. Pattern Anal. recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017. 2016, pp. 770–778.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 03,2021 at 08:05:09 UTC from IEEE Xplore. Restrictions apply.