0% found this document useful (0 votes)

15 views25 pages

Infrared and Visible Camera Integration For Detection

This article evaluates the detection and tracking of small non-cooperative UAVs using Electro-optical and Infrared sensors, comparing decision-level and pixel-level data fusion techniques against independent sensor use. The study reports an average precision of 88.4% and highlights the advantages of decision-level fusion in recall and precision, despite lower frame rates. Additionally, the research emphasizes the need for spatially and temporally aligned datasets for effective UAV detection and tracking.

Uploaded by

satyam.drode22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views25 pages

Infrared and Visible Camera Integration For Detection

Uploaded by

satyam.drode22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

drones

Article
Infrared and Visible Camera Integration for Detection and
Tracking of Small UAVs: Systematic Evaluation
Ana Pereira 1 , Stephen Warwick 2 , Alexandra Moutinho 3 and Afzal Suleman 2,3, *

1 Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal; [email protected]

2 Department of Mechanical Engineering, University of Victoria, Victoria, BC V8P 5C2, Canada;
[email protected]
3 IDMEC, Mechanical Engineering Institute, Instituto Superior Técnico, Universidade de Lisboa,
1049-001 Lisbon, Portugal; [email protected]
* Correspondence: [email protected]

Abstract: Given the recent proliferation of Unmanned Aerial Systems (UASs) and the consequent
importance of counter-UASs, this project aims to perform the detection and tracking of small non-
cooperative UASs using Electro-optical (EO) and Infrared (IR) sensors. Two data integration tech-
niques, at the decision and pixel levels, are compared with the use of each sensor independently
to evaluate the system robustness in different operational conditions. The data are submitted to a
YOLOv7 detector merged with a ByteTrack tracker. For training and validation, additional efforts are
made towards creating datasets of spatially and temporally aligned EO and IR annotated Unmanned
Aerial Vehicle (UAV) frames and videos. These consist of the acquisition of real data captured from a
workstation on the ground, followed by image calibration, image alignment, the application of bias-
removal techniques, and data augmentation methods to artificially create images. The performance
of the detector across datasets shows an average precision of 88.4%, recall of 85.4%, and [email protected] of
88.5%. Tests conducted on the decision-level fusion architecture demonstrate notable gains in recall
and precision, although at the expense of lower frame rates. Precision, recall, and frame rate are not
improved by the pixel-level fusion design.

Keywords: YOLOv7; ByteTrack; electro-optical sensor; infrared sensor; data fusion

Citation: Pereira, A.; Warwick, S.;

Moutinho, A.; Suleman, A. Infrared
and Visible Camera Integration for
Detection and Tracking of Small
1. Introduction
UAVs: Systematic Evaluation. Drones Aside from the advantages an Unmanned Aerial System (UAS) offers in a variety
2024, 8, 650. https://doi.org/ of military and civilian applications, the number of records of malicious activity caused
10.3390/drones8110650 by the lack of regularization, negligence, or criminal intent has been growing recently [1].
Academic Editor: Seokwon Yeom
As a result, there has been a significant increase in efforts dedicated to the research and
development of Counter-UAS (C-UAS) frameworks capable of countering threats posed by
Received: 17 October 2024 hostile UASs, and improving safety, security, and privacy.
Revised: 3 November 2024 The architecture of a C-UAS system can essentially be separated into two main compo-
Accepted: 4 November 2024 nents [2]. The perception of the Unmanned Aerial Vehicle (UAV) is first achieved through
Published: 6 November 2024
the sensing step, which uses one or more sensors to detect and distinguish it from other
objects. This is followed by the classification step, which verifies the object and identifies
it to ascertain whether it is malicious or illegal [3]. Then, localization in terms of the rel-
Copyright: © 2024 by the authors.
ative position and velocity is conducted, as well as the tracking of the target in order to
Licensee MDPI, Basel, Switzerland. follow its trajectory. The final step is neutralization which involves the mitigation of the
This article is an open access article threat by interfering with or disabling it. These systems can be categorized as electronic
distributed under the terms and or kinetic–mechanical neutralizers. Radio Frequency (RF) jamming, GNSS jamming, and
conditions of the Creative Commons spoofing are some examples of electronic neutralizers that use electronic waves to disrupt
Attribution (CC BY) license (https:// the performance of the target. Kinetic–mechanical techniques are based on the physical
creativecommons.org/licenses/by/ interception of the UAV through the use, for instance, of projectiles, collision UAVs, or
4.0/). nets [4].

Drones 2024, 8, 650. https://doi.org/10.3390/drones8110650 https://www.mdpi.com/journal/drones

Drones 2024, 8, 650 2 of 25

Related Work
Several distinct approaches to UAV detection have been introduced over the years, as
explored in [5]. The most frequently used sensors are RADAR, LiDAR, Electro-optical (EO)
and Infrared (IR) imaging cameras, RF based, and acoustic. Each of these sensors has been
widely utilized for the detection of objects for different applications, not necessarily for
UAVs due to their small size constraint. The main characteristics of each sensor, focused on
the detection of small objects, are described as follows:
• RADAR: RADAR transmits electromagnetic waves that are reflected by targets, with
a frequency range from 3 MHz to 300 GHz. The interpretation of the reflected rays
may determine the position and velocity of the objects. A significant issue with
the use of RADAR for the detection of UAVs is raised by their low RADAR cross-
section area, that might make them undetectable. However, because RADAR presents
a high robustness to weather and lighting conditions, research on micro-Doppler
signature-based methods has been conducted [6]. In [7], the micro-Doppler effect for
frequency-modulated continuous-wave (FMCW) RADAR applications is modelled,
showing high confidence rates for UAV class identification based on the number of
UAV motors. In [8], an X-band pulse-Doppler RADAR is used to compare the RADAR
signatures of fixed-wing UAVs with only puller blades, multirotor UAVs with only
lifting blades, and VTOL UAVs with both lifting and puller blades, which can help
identify UAV types.
• LiDAR: LiDAR shares the working principle of RADAR, although having a higher
frequency range, from 200 THz to 400 THz. It is also able to issue a 3D map of the
environment. Although it is less robust to weather, it is still a valuable tool to initialize
the position of the object in a detection system. In [9], a probabilistic analysis of the
detection of small UAVs in various scenarios is proposed.
• RF sensor: These passive sensors capture the signals used by a target to communicate
with the ground, making it possible to detect, locate and also, in some cases, identify
the aircraft. Apart from being robust to weather and lighting, one important feature of
RF sensors is the possibility of detecting the controller on the ground, which is relevant
in countering a threat. In [10], spectral–temporal localization and classification of
the RF signals of UAVs with a visual object detector approach are performed. This
shows promising results, even in noise interference situations, by processing the
spectrograms using the YOLOv5 object detector. In [11], a novel RF signal image
representation scheme that incorporates a convolutional neural network (CNN) is
implemented to perform UAV classification, achieving high classification accuracy
scores of 98.72% and 98.67% on two different datasets.
• Acoustic sensor: An acoustic sensor can detect, distinguish, and identify the sound
emitted by the engine and propellers of a UAV. By using a specific arrangement of
multiple microphones, the estimation of the azimuth and elevation of one or more
UAVs is possible. However, this sensor presents some limitations in terms of the
detection range and accuracy, and susceptibility to background noise interference,
even though it is a low-cost and accessible tool. In [12], good performance results of
the detection and localization of small UAVs are achieved by using an acoustic-based
surveillance system. A UAV detection system based on acoustic signatures that are fed
to two machine learning models, a Random Forest and Multilayer Perceptron (MLP),
is proposed [13]. The MLP model was considered a better solution for the detection of
complex and nonlinear acoustic features.
• EO camera: An EO camera allows for the detection of objects by capturing the light
reflected by them. Although being intuitive for interpretation, and capable of pro-
viding detailed information on the surrounding environment, these sensors present
low robustness to low lighting scenes, namely at night, and weather conditions, such
as rain and fog. For visual object detection, different computer vision algorithms
based on Deep Learning (DL) models have been developed. A comparison between
14 object detectors with the proposed visual UAV dataset is conducted in [14]. This
Drones 2024, 8, 650 3 of 25

makes conclusions based on the performance and processing time. In [15], a detection
and tracking system for small UAVs using a DL framework that performs image
alignment is proposed. Results with high-resolution images show a track probability
of more than 95% up to 700 m. The problem of distinguishing small UAVs from birds
is addressed in [16]. It concludes that object detectors benefit from being trained with
datasets that include various UAVs and birds in order to decrease the number of False
Positive (FP) detections on inference.
• IR sensor: IR sensors, that capture the thermal signature of objects, are much explored
in the military sector, and are more adequate for some challenging scenarios. Partic-
ularly using Long-Wave Infrared (LWIR) sensors, the thermal signatures emitted by
the batteries of UAVs can be detected. These sensors typically have a lower imaging
resolution and higher granularity, which limit their use independently. These issues
that result in a lack of texture and feature highlight are addressed in [17]. An im-
proved detector that drops low-resolution layers and enhances high-resolution layers
is proposed. It also includes a multi-frame filtering stage that consists of an adaptive
pipeline filter (APF) to reduce the FP rate, achieving a precision of more than 95%.
Promising results in small-UAV detection using an IR sensor are achieved in [18],
by learning the nonlinear mapping from an input image to the residual image and
highlighting the target UAV by subtracting these images.
In particular for vision-based object detection, which is one of the most relevant
tasks of computer vision, DL methods are commonly exploited [19]. The object detectors
take an image as input and output the bounding boxes of the detected objects with the
corresponding labels. In turn, multi-object trackers provide continuity in time by connecting
sets of detections across time, by assigning an ID to each one, without any previous
knowledge of their location.
To either tackle different problems or to enhance performance and results, the usage of
more than one sensor is advantageous. For imaging sensors, decision-level and pixel-level
data fusion approaches have shown promising results for various applications. In [20], a
decision-level approach that bases the detection on the sensor with the highest confidence
score at the output stage is developed. In [21], context-aware fusion at pixel level is
performed after image segmentation for traffic surveillance applications. In this case, a
merged image from the output of both cameras is created to be detected. One of its most
relevant challenges is the requirement for spatially and temporally aligned data.
Publicly available datasets of spatially and temporally aligned EO and IR images
for the detection of UAVs are scarce. In [22], three methods to obtain paired EO and
IR images for the detection of cars and people are studied. These include a Generative
Adversarial Network (GAN) algorithm to generate images, a simulation environment,
and a combination of both. The results show a poor performance of the detector when
trained with the synthetic datasets and tested with real data. Even so, traditional data
augmentation techniques such as image manipulation, image erasing, and image mixing
can benefit a dataset and improve the detection results.
Small-object vision-based detection remains a challenging topic, despite extensive
progress over the years to improve both the processing time and accuracy of these systems.
Comparative research into sensors and sensor integration techniques is important.
The present work aims to perform the detection and tracking of small non-cooperative
UAVs using EO and IR imaging sensors. It makes a comparison between the use of each
sensor independently and two sensor fusion algorithms, at the decision and pixel levels.
Since the main objective is to use real data to evaluate the performance of the architectures
in different conditions and scenarios, a further goal is the construction of the necessary
spatially and temporally aligned EO and IR datasets of UAVs. This includes the creation of
artificial data and the acquisition of real data during flight experiments (FEs) performed at
the University of Victoria’s Center for Aerospace Research (UVIC-CfAR). By conducting
extensive robustness tests and validating the system using real flight data, a conclusion
Drones 2024, 8, 650 4 of 25

on the most appropriate method in terms of performance and real-time capabilities for the
cases explored is made.

2. Detection and Tracking Architecture

2.1. Proposed System Architecture
The system architecture depends on the data fusion methodology that can happen at
the output or input level from the point of view of the detector.
The decision-level data fusion case is a late-fusion method, in which a decision is
made by combining the outputs of different algorithms previously processed. In this case,
firstly, both images are fed to separate detectors previously trained for the corresponding
image types. Then, the algorithm takes both output detections and computes the final
result before feeding it to the tracker. This process for a single frame is shown in Figure 1.

Figure 1. Decision-level data fusion stages.

For the pixel-level data fusion, the raw pixels from multiple sources are combined, so
the fusion occurs on a pixel basis. This is an early-fusion method because it occurs before
image classification. In this case, in the first step, both images are merged into a single one
that preserves the relevant features of each, and then the resulting image is submitted to a
single detector and tracker. The steps of this method are depicted in Figure 2.

Figure 2. Pixel-level data fusion stages.

In this project, the object detector selected was YOLOv7 that uses a deep CNN in
order to identify objects in images [23]. It belongs to the current state of the art for the
real-time detection You Only Look Once (YOLO) family of object detectors. This choice
was based on the performance improvements both in terms of accuracy but especially
processing time that the authors of YOLOv7 managed to achieve. Even though more recent
versions are available at present, not enough literature had been produced for those at the
time of this selection, so the YOLOv7, which had been extensively reported, was selected.
Despite the development and progress of object detectors in recent years, there are still
relevant challenges that were considered in this work. These include intra-class variation,
where detectors may fail to detect objects of the same class not integrated in the dataset,
and inter-class variation, where object detectors may fail to distinguish different classes.
Hardware requirements are also a relevant limitation since computer vision algorithms
often have a high demand for memory and lead to intensive training sessions.
As for the tracker, the state-of-the-art ByteTrack was selected [24]. ByteTrack is a
tracking-by-detection tracker that uses Intersection over Union (IoU) to associate detections
provided by the object detector with the tracks stored in memory. Based on the detection
results, the tracker creates an ID of the object and follows its trajectory in consecutive
frames, thus giving it the same ID, or a new instance, if it is a new different detected object.
It also uses a Kalman filter to make predictions on the position of the objects in the current
frames, given their location in the previous frames. As opposed to most trackers, ByteTrack
keeps all the detections provided by the associated detector to increase robustness to
cases of occlusion, motion blur, and variable bounding box size increases. In this project,
Drones 2024, 8, 650 5 of 25

the tracking task performed refers to following the trajectory of a UAV in videos, not to
following its trajectory during its flight by having the sensors autonomously move.
Image alignment in consecutive frames was additionally considered for the system
to compensate for camera motion, before submission to the tracker. Here, Enhanced
Correlation Coefficient (ECC) maximization was chosen to estimate the parameters of the
motion models for the system [25]. This gradient-based iterative method is robust against
geometric and photometric distortions. Previous work developed at UVIC-CfAR showed
tracking improvements when this algorithm was applied [26].

2.2. Data Fusion Methodology

The decision-level data fusion algorithm used in this project was developed with the
main goal of improving the detection task results with a minimal impact on the processing
time. For this reason, it considers one data type as the main one, thus favouring one of
the cameras, and the other one as confirmation data only accessed in certain scenarios.
According to this principle, the frames produced by the head sensor, that can be either
camera, are always fed to the corresponding head detector and its predictions processed.
Based on the characteristics of these detections and the need for a complement on the
information gathered, the confirmation sensor may be used. The criteria used to determine
the need for a confirmation model for each frame is mainly based on two factors related to
the predictions of the head detector: the number of predictions in every frame, that aims
to eliminate FP or False Negative (FN) detections, and the confidence score associated.
When the confirmation model needs to be accessed, the system has two prediction sets,
the head, and the confirmation predictions, and a balance between them is calculated.
The first step is to match the detection pairs between both image types, if existent, that is,
determine whether the detections from the head and confirmation models are of the same
object or different instances. Once the matches are computed, the confidence thresholds
of the detections are analysed and compared to two confidence thresholds, the primary
and the secondary thresholds, previously established. It is sufficient that an object is
detected only in one image type with a confidence score higher than the primary threshold
to be considered valid, but it is necessary that an object is detected in both models with
confidence scores higher than the secondary threshold to be valid, if none is higher than
the primary threshold. In this project, both the EO and the IR sensors were tested for both
positions. This approach to the fusion has the advantage of saving detection processing
time since there is a reduced number of instances when both models are loaded. This
algorithm requires that the videos from both sensors have the same frame rate.
For pixel-level data fusion, extensive research on algorithms developed for the fusion
of EO and IR images, despite not being specific for the UAV case, was conducted [27]. It
contains a comparative study on both algorithm performances through the analysis of the
resulting fused images on the selected metrics, and processing time. In this project, the
requirement for an adequate fusion time per image was considered more important and, so,
the algorithm selected, that achieved a good performance in the analysis in the mentioned
paper, is called FusionGAN [28]. This aims at producing enhanced greyscale images by
taking advantage of the thermal information while maintaining visible textures. Here,
the GAN generator is trained to create images that take into consideration the thermal
intensities of the IR image along with the image gradients present in the EO images. The
GAN discriminator is trained to maximize the presence of the features of the EO image
in the fused image. The final resulting image is selected when the discriminator can no
longer distinguish the images produced by the generator from real ones. An example of the
application of the FusionGAN is shown in Figure 3. One advantage of this algorithm is that
it is possible to perform the fusion of the images when they do not have the same resolution
by up-sampling the low-resolution image. This is important because the available EO
cameras generally have much higher resolutions than IR sensors. A relevant limitation of
this algorithm is the high amount of GPU memory it requires to process the images and
produce the fusion.
Drones 2024, 8, 650 6 of 25

(a) (b) (c)

Figure 3. FusionGAN algorithm application: (a) Input Electro-optical (EO) image. (b) Input Infrared
(IR) image. (c) Pixel Fused output image.

3. Dataset
Two datasets were created due to the requirements imposed by the data fusion method-
ologies selected, using data collected during flight experiments: the labelled dataset and
the inference dataset. The labelled dataset was used to train object detectors, and included
spatially and temporally aligned variations of EO, IR, and Pixel Fused real and artificial
images. Each of these sub-datasets consisted of a total of 5977 labelled images of UAVs.
In turn, the inference dataset consisted of EO, IR, and Pixel Fused real videos that were
spatially and temporally aligned but not labelled, that totalled 35,907 frames.

3.1. Experimental Work

The main components of the video capture system used at UVIC-CfAR were the
cameras, the analogue-to-digital signal converters, and the software used. Figure 4 shows
the experimental setup, where the sensors are highlighted in red.

Figure 4. Experimental setup with highlight on the sensors.

The EO camera sensor was a SONY FCB-EX1020 PAL and the IR camera sensor was a
FLIR TAU 640 PAL. These sensors were integrated in a TASE 200 gimbal, so there was a fixed
displacement between them. This displacement was measured to be 50 mm ± 0.5 mm. The
camera parameters were controlled using ViewPoint software. Each sensor was integrated
with a low-latency video encoder, Antrica ANT-1772, that can stream in both RTSP and
MPEG TS formats over an Ethernet connection. The software Neptune Guard was used to
configure each encoder. The streams were displayed and recorded with Neptune Player
that has a very-low-latency viewer. The OBS program was also used simultaneously for
recording and live streaming the same videos.
Drones 2024, 8, 650 7 of 25

The equipment was on a workstation on the ground capturing the aircraft in the air
and not mounted onboard an aircraft. The operator manually moved the gimbal to include
the aircraft in its field of view, and no autonomous gimbal movement was used. This means
that the cameras were fixed with respect to each other, but the gimbal was moving to record
the aircraft. Both sensors were always set to start recording at the same time to contribute
to the temporal alignment of the frames.
In case this system is mounted onboard an aircraft in future work, the main equipment
change that needs to be made is replacing the computing unit with an embedded system.
The weight of the whole system to be deployed must also be taken into consideration. The
selection of the detector aircraft will depend on this payload weight, which would include
an embedded system, the TASE 200 gimbal, the two video encoders, batteries to power
these components, digital datalinks for communication with the ground station, and the
necessary cables.
First, sensor calibration was performed for the two sensors independently to determine
their geometric parameters since image alignment is essential to guarantee the success
of the FusionGAN algorithm. The MATLAB simple camera calibration app was used for
the calibration. Based on the Pinhole Perspective camera model, the intrinsic, distortion,
and extrinsic parameters of each sensors were estimated [29]. The calibration results are
presented in Table 1, where the most significant difference is observed in the distortion
coefficients, namely for radial distortion. In fact, the distortion that the IR sensor causes in
the images is noticeable to the naked eye in some cases. However, this is found to have a
negligible impact on the alignment of the UAVs at longer ranges.

Table 1. Sensor calibration results.

Sensor Calibration Matrix Distortion Coefficients

2.5609 × 103
 
0 313.4439
EO  0 2.9276 × 103 360.1981 {2.1355, −5.5289, −1.5735 × 10−4 , 0.1064}
0 0 1
2.4726 × 103
 
0 283.0549
IR  0 2.7449 × 103 194.0649 {−0.3653, 23.3465, −0.0247, −0.020}
0 0 1

One aspect to consider was that the IR sensor could not capture a defined image
at a close range. For this reason, a calibration board printed in the standard A4 or A3
sizes would be too small to be captured, appearing blurry through the IR sensor, which
would make the calibration process impossible. Instead, a 10 × 7 calibration board with
15 cm × 15 cm squares, totalling 150 cm × 105 cm, was built to capture the images, as can
be seen in Figure 5.

(a) (b)
Figure 5. Cont.
Drones 2024, 8, 650 8 of 25

Figure 5. Calibration procedure using the calibration board created: (a) EO image at close range.
(b) EO image at far range. (c) IR image at close range. (d) IR image at far range.

The goal of the most relevant flight experiments was to collect data containing as
much variety in operational conditions as possible, in the form of videos, in mp4, recorded
at 25 fps, to integrate the dataset to train the object detector models and test the system.
Flight Experiment A was an operation at UVIC-CfAR with the aircraft Mini-E [30],
illustrated in Figure 6a. This aircraft was flying in circles passing through the waypoints
shown in Figure 7a. Additional flights of a DJI Mavic 2, displayed in Figure 6b, were
performed in the same day to guarantee more variety on a more common aircraft. This
UAV was flying following straight lines as can be seen in Figure 7b. The total recorded
flight time was 51 min 21s for the Mini-E and 22 min 28 s for the DJI Mavic 2. During the
postprocessing stage of the raw videos, it was concluded that the EO and IR frames were
not always totally aligned. This happened mostly when one of the sensors of the gimbal
automatically adjusted a camera parameter, creating a lag on the transmission of the videos.

(a) (b) (c)

(d) (e) (f)

Figure 6. UAVs captured during flight experiments: (a) FE A—Mini-E. (b) FE A—DJI Mavic 2. (c) FE
B—MIMIQ. (d) FE C—DJI Inspire 1. (e) FE D—Zeta FX-61 Phantom Wing. (f) FE D—DJI Mini 3 Pro.
Drones 2024, 8, 650 9 of 25

(a) (b) (c)

Figure 7. Schematics of flight paths: waypoints (blue) and workstation (red): (a) Flight Experiment A
(Mini-E). (b) Flight Experiment A (DJI Mavic 2). (c) Flight Experiment C.

Flight Experiment B was also an operation at UVIC-CfAR with the main goal of
gathering data on the hybrid multirotor [31], shown in Figure 6c. The flights captured
include vertical take-off, hovering, and landing. Since this flight experiment was performed
inside a gymnasium and hence the background is similar in all frames, only a total of 2 min
10 s was recorded.
Flight Experiment C was conducted at UVIC-CfAR with the main goal of gathering
footage to include in the dataset to test the system on inference. This includes both the same
DJI Mavic 2 of Flight Experiment A, shown in Figure 6b, to assess the different performance
results of the system with the same aircraft but different conditions, and also a DJI Inspire
1, that can be seen in Figure 6d, to test the robustness of the system to intra-class variation.
The flight paths chosen for this experiment were the same for both UAVs, that followed
straight lines as depicted in Figure 7c. The DJI Mavic 2 and DJI Inspire 1 were recorded
during a total of 19 min 18 s and 24 min 35 s, respectively.
Additional data were collected on Flight Experiment D conducted at Instituto Superior
Técnico (IST) with a TeAx ThermalCapture Fusion Zoom. The frames provided include a
Zeta FX-61 Phantom Wing and a DJI Mini 3 Pro, as can be seen in Figure 6e,f, respectively.
In general, the data include frames with the UAV blurred or partially cut, the presence
of birds in some frames, frames above and below the local horizon, and a background with
variety in objects, especially trees, houses, and farming tools. There is also variety in the
range of the UAV and its position in the frames. In terms of lighting, variety includes bright
images taken during summer days, indoor images with artificial lighting, and images taken
at twilight in autumn.

3.2. Labelled Dataset

The real data in the labelled dataset are the data collected during Flight Experiments
A, B, and D.
As mentioned, the data gathered during Flight Experiments A and B was not com-
pletely aligned, so postprocessing included calibration, spatial and temporal alignment,
and the application of a method for bias removal.
The image calibration eased the spatial alignment of the images by effectively adjusting
several parameters, including tangential and radial distortions. Nevertheless, aligning two
images that were captured by distinct sensors is a challenging task. For this reason, the
image alignment was a supervised procedure, mainly due to three reasons encountered.
First, some video frames were not perfectly aligned in time. Secondly, the IR sensor has a
much slower shutter speed than the EO sensor, which often led to motion blur for higher
UAV velocities. Since altering shutter speeds was not an option of the software used, a
decision was made to include this type of pairs in the dataset. Lastly, the parallax effect
results from the fixed displacement of the camera lenses when they were mounted in the
gimbal. This effect was more noticeable when the UAV was near the sensors, and negligible
at long range, having no effect on alignment.
For Flight Experiments A and B, further work was carried out to eliminate the dark
gradients that the IR sensor created and that are especially pronounced in the corners of
Drones 2024, 8, 650 10 of 25

the images. Computer vision algorithms may be significantly impacted by this kind of
effect. First, this was considered a vignette effect, that is a brightness attenuation away
from the image center, and treated as such, using a method to estimate it from a single
image [32]. This approach did not yield satisfactory vignette function estimations for all
the images. For this reason, it was considered an intensity nonuniformity, that is a bias
that can be caused by illumination changes, thus taking the perturbation as a variation in
intensity that does not take a specific distribution [33]. By using this strategy, the results for
gradient estimation and hence bias removal from the images was acceptable. One result
example of this procedure is shown in Figure 8. The usage of such an approach can generate
more noise, so, for research purposes, the IR dataset was duplicated and the bias-removal
algorithm applied to the copy. A Pixel Fused dataset with the FusionGAN algorithm using
as inputs the EO and IR with bias removed was also created. The goal was to evaluate the
effect of the image correction by comparing the performance of the object detectors with
the original images and bias-corrected images.

(a) (b) (c)

Figure 8. Bias-removal algorithm application: (a) IR original image. (b) IR bias-corrected image.
(c) Estimated bias.

For the Flight Experiment D data, the software used for data capture performed
image alignment, so the frame selection was solely supervised to guarantee the elimination
of outliers.
For the real data captured, the UAV had to be labelled by outlining its bounding box,
assigning it a class, and producing a .txt file in the YOLO format. The accuracy of the labels
associated with each object has a significant impact on the performance of an object detector.
Two main labelling strategies were considered. First, by sending the data to another object
detector trained for the same purpose, a DL-assisted methodology could be implemented.
The label files could be created using the output detections of this extra detector. However,
this method has an additional source of error, since it is dependent on the precision, recall,
and accuracy of this detector, which might have significant impacts on the dataset. For
this reason, although being more time-efficient, to guarantee a good outcome, this method
should be supervised and the results verified. Secondly, the dataset could be manually
labelled by outlining the UAV, if present, and creating the .txt file. Even though the latter is
especially time-consuming for large datasets, a decision was made to manually label all
the real data that was featured in the labelled dataset (5977 images) to avoid the error a
DL-assisted method can have.
Finally, a data augmentation strategy was used to create artificial images. Figure 9
shows three examples of the final produced pairs of images of UAVs. The method consisted
of firstly placing background-transparent images of UAVs in spatially and temporally
aligned images of backgrounds, and in the end applying random features such as a bright-
ness change to increase variety in the dataset. This approach is especially interesting in this
particular scenario as there are not many restrictions on the position of a UAV within an
image. For instance, this method would not be effective for a dataset of railed vehicles that
have to be placed on rails for an image to be plausible. The algorithm started by making a
random choice of an image pair for the background and of a UAV image. Due to the lack of
Drones 2024, 8, 650 11 of 25

publicly available datasets of paired UAV images, the algorithm took as input only an EO
image of the UAV. The corresponding IR image was created by the algorithm by applying
a transformation to the EO image. This gave it a random greyscale intensity, between a
range of values, and applied a random level of blurriness to its outline. This approach was
selected after performing some tests that reached the conclusion that this was the method
that created IR images of the UAV most similar to the ones from the IR sensor used in the
flight experiments. Then, it randomly chose the size of the UAV in the frame, followed
by the random selection of its position in the image. It also incorporated the options to
rotate the UAV, and change the brightness, contrast, and blurriness level of the produced
image. The background images used were obtained by the TASE 200 both during Flight
Experiment A and during extra experiments at UVIC-CfAR. As for the UAV images, eight
different models that included quadcopters, a hexacopter, and a fixed wing were used. This
process has the advantage of automatically producing the labels in the YOLO format.

(a) (b) (c)

(d) (e) (f)

Figure 9. Artificial image pair creation algorithm: (a–c) EO images. (d–f) IR images.

To sum up the labelled dataset, there were a total of five variations of spatially and
temporally aligned images: EO, IR, IR with bias removed, Pixel Fused, and Pixel Fused
with bias removed. Figure 10 shows three frame examples of the dataset. The image size is
variable in a few pixels due to the image alignment, but is approximately 640 × 512 for all
images. From the total, some are images with no UAVs, and the remaining ones have a total
of 11 different aircraft. About 20% of the images are artificially created. The comparison of
the architectures is more legitimate by having the datasets identical apart from in image
type or data fusion methodology.
In this project, to have as much data in the training set as possible because the dataset
was relatively small, the 80-10-10 partition for training, validation, and test sets was used.
Drones 2024, 8, 650 12 of 25

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Figure 10. Dataset examples: (a–c) EO images. (d–f) IR images. (g–i) IR images with bias removed.
(j–l) Pixel Fused images. (m–o) Pixel Fused images with bias removed.
Drones 2024, 8, 650 13 of 25

3.3. Inference Dataset

The videos that were used to evaluate all of the different systems on inference were
the videos from Flight Experiments A that were not used to build the labelled dataset and
the videos from Flight Experiment C. The videos were separated into segments of interest
to discard the asynchronous video segments, avoid testing in videos with sudden camera
movements caused by the operator, and isolate the desired testing features and conditions.
These include the separation of the segments into the UAV range, based on the number of
pixels the UAV occupied in the images, and type of background, depending on whether the
UAV was above or below the local horizon. The cases with a blurry UAV and partially cut
UAV were also isolated. Finally, to assess the system robustness to intra-class variation and
inter-class variation, segments featuring an aircraft that was not used to train the detectors
and segments with the presence of birds, respectively, were isolated.

4. Results and Discussion

This section describes the detection and tracking results obtained by the system. Firstly,
the training and testing processes of the YOLOv7 detector for the five different variations of
the labelled dataset are explained. A comparison with the YOLOv7-tiny model is conducted.
Secondly, the models are tested independently on the inference dataset to evaluate and
benchmark their performance. Thirdly, both data fusion architectures are tested on the
inference dataset, and the average results presented. These sections also include the isolated
testing and comparison of the systems in specific challenging cases for UAV detection.

4.1. Detector on Labelled Dataset

The five different labelled datasets were used in YOLOv7 model training sessions
using the same parameters for all, for comparison reasons. Three different GPUs were
used: the NVIDIA GeForce RTX 4080, NVIDIA GeForce RTX 2070 SUPER, and the NVIDIA
GeForce RTX 3050.
The resulting models were evaluated using the test set images. The results are pre-
sented for precision, recall, Mean Average Precision at 0.5 ([email protected]), where the IoU
threshold used is 0.5, and Mean Average Precision at [.5:.95] (mAP@[.5:.95]), where a set of
thresholds from 0.5 to 0.95 with a step of 0.05 are averaged. Table 2 shows the YOLOv7
results for 500 epochs, where BR stands for Bias Removed.

Table 2. YOLOv7 test set results for 500 epochs.

Model Precision Recall [email protected] mAP@[.5:.95]

EO 0.860 0.827 0.839 0.599
IR 0.894 0.846 0.885 0.656
IR BR 0.887 0.860 0.886 0.646
Pixel Fused 0.893 0.873 0.900 0.622
Pixel Fused BR 0.886 0.866 0.896 0.614
Average 0.884 0.854 0.885 0.627

It is possible to conclude that all models had a similar performance on the respective
test sets. The lower values for mAP@[.5:.95] mean that the bounding boxes outlined by the
detectors were not always exactly placed, even if the detection of the UAV was correct.
As for the differences between datasets, both the IR and both the Pixel Fused models
always outperformed the EO model, although it was not a significant improvement. One
possible explanation for this is the extra information an EO model has to learn, since it
involves colours, while the other models deal with intensities. Additionally, one present
case in this dataset that benefited the IR and Pixel Fused models was when the UAV was
below the local horizon, thus having a textured background. Although this was not found
in great abundance, this may have led to FNs for the EO detector, especially if the UAV had
similar colours to the scene.
Drones 2024, 8, 650 14 of 25

Finally, it is relevant to compare the IR and Pixel Fused models with the corresponding
models with bias removed. In general, the metrics showed a better performance for
the original models. Firstly, the bias-removal algorithm might have introduced noise in
the images, deteriorating the results. Secondly, the bias estimation and then removal
procedure might have removed pixel intensity from the UAV, reducing its highlight and
thus disturbing the detection process. Finally, it is possible that the bias inherent in the
original images had no negative effect on the results. This could be because the gradients
were not fixed for all the images, that is, the bias mask was not constant, and due to the fact
that the dataset included variety in the position of the UAV in the frame. Since no significant
improvements were observed, further tests consider the models without bias removal.
By examining the output images that are a part of the test set, it was possible to isolate
the operational conditions that contributed to an increase in FPs and FNs leading to a
decrease in precision and recall, respectively. For precision, the models often mistook
birds for UAVs, besides producing some FPs in the presence of background objects such as
houses. For recall, the majority of the FNs occured when the UAV was flying below the
local horizon with a textured background. The IR sensor presented higher robustness to
this scenario. Apart from these impacts, there were also other conditions with particular
interest in the context of this project, such as when the UAV appeared blurry or partially
cut in the image, and intra-class variation. This can be tested with the data from Flight
Experiment C that includes an aircraft not featured in the dataset for training.
Additional training sessions were performed using the YOLOv7-tiny model. This is a
similar approach to the YOLOv7 model but this configuration takes a reduced number of
parameters, thus using less GPU memory, which makes it faster and less resource-intensive.
The tests and results using the YOLOv7-tiny model are relevant in case the system is
implemented onboard an aircraft. In this scenario, the computing system used needs to be
changed to an embedded system, which limits the frame rate since its parallel processing
ability is significantly constrained.
Table 3 shows precision, recall, and mAP results as presented for the YOLOv7 model.
In this case, an additional column with the percentage of decrease in the processing time of
the YOLOv7-tiny model, when compared to the YOLOv7 model, is presented.

Table 3. YOLOv7-tiny test set results for 300 epochs.

Time per
Model Precision Recall [email protected] mAP@[.5:.95] Image
Variation (%)
EO 0.856 0.813 0.823 0.544 −47.6
IR 0.877 0.835 0.873 0.615 −67.1
Pixel Fused 0.878 0.855 0.872 0.550 −55.7
Average 0.870 0.834 0.856 0.570 −56.8

When compared to the results from Table 2, accuracy decreases when using the
YOLOv7-tiny model, even though it is not very significantly. In terms of processing time,
however, the decrease obtained is on average 56.8% across datasets. These results are
relevant for a real-time implementation of the system onboard an aircraft, which requires
the models to run on an embedded system.
Since further testing in this project was performed offline using an NVIDIA GeForce
RTX 4080, which is a relatively fast GPU, it was decided to continue with the YOLOv7
models, thus favouring performance metrics instead of the frame rate.

4.2. Overfitting
Overfitting is a concerning problem in a detection system. In particular, for this project,
overfitting was carefully examined since the detector was trained using an original and
relatively small dataset, and some precautions were also taken to prevent it. Firstly, the
Drones 2024, 8, 650 15 of 25

validation set of data was used to perform cross-validation. This process consisted of the
constant evaluation of the models using the validation set during the training process in
order to assess their capacity to generalize to different data. Secondly, data augmentation
was used mainly by activating the YOLOv7 built-in augmentation option. This includes the
application of methods such as translation, cropping, noise, brightness, contrast, saturation,
and Gaussian blur to the images during the training stage. Finally, there was a strict control
over the overall number of training epochs used for each model, and the training sessions
were stopped when no significant improvements were verified. Since, as shown in Table 2
for a training session similar to all five datasets, the models presented good results, showing
the ability to make accurate predictions for the validation data, and also for new data in
the test set, it was concluded that they were not overfitting. To further prove this for the
EO, IR, and Pixel Fused models, independent tests in inference were conducted using
video segments with more variety in certain conditions. Taking this into account, it was
concluded that the models were not overfitting, despite not always being robust to all
conditions and variables. For the goal of this study, which is a proof of concept for data
fusion techniques, it was decided that the models in their present condition were adequate
and sufficient.

4.3. Independent Model Testing

Independent model testing refers to the testing of the system on the inference dataset
using each one of the sensors separately and without any data fusion methodology asso-
ciated. The aim is to benchmark the performances of just the detector, the detector plus
tracker, and the ECC algorithm. The results obtained using the NVIDIA GeForce RTX 4080
GPU are presented separately from Flight Experiments A and C. This is due to the fact that
the labelled dataset includes frames taken from other videos during Flight Experiment A,
sharing similar conditions in terms of lighting and background. In this way, the segments
from Flight Experiment A are used for the comparison of data fusion architectures and
the effectiveness of algorithms, and the segments from Flight Experiment C to test system
robustness to different conditions not featured in the training data.
Table 4 shows the average detection and tracking results of precision, recall, frame
rate, and the average number of Identification Switches (IDSs) for every 100 frames of
the tracker.

Table 4. Independent model testing on the detector and tracker for Flight Experiments A and C.

Nr. of Frame Rate IDS per

FE Data Range Precision Recall
Frames (fps) 100 Frames
close 2736 0.956 0.983 97.6 3.408
A EO medium 5721 0.962 0.995 101.3 2.483
far 4645 0.937 0.878 101.2 3.896
close 2736 0.940 0.977 99.2 5.783
A IR medium 5721 0.960 0.963 103.0 4.398
far 4645 0.905 0.955 100.4 11.779
close 2226 0.950 0.638 105.1 5.809
medium 8680 0.986 0.770 101.8 1.501
C EO
far 9087 0.986 0.634 105.0 1.734
v. far 2812 0.932 0.111 117.0 1.798
close 2226 0.964 0.455 111.4 1.565
medium 8680 0.943 0.773 103.9 4.354
C IR
far 9087 0.919 0.717 104.8 9.071
v. far 2812 0.928 0.245 115.8 3.872

It is observable that the results for Flight Experiment A are better in terms of recall, in
general, due to the frequent presence of birds, which can lead to FPs. In turn, as expected,
the results on the video segments from Flight Experiment C are better for precision due to
Drones 2024, 8, 650 16 of 25

the variety in conditions and UAV models that these data have, which were not included in
the dataset and that can lead to FNs. In fact, the predominance of video segments with the
UAV below the local horizon is purposely much higher for Flight Experiment C, since it is
one of the target conditions to be analysed by the data fusion methods. One other factor to
consider is the lighting condition of the environment, which was recorded predominantly
around twilight, where the sky is exceptionally bright. This led the images below the local
horizon to be very dark in contrast with the sky, and so the UAV was often undetectable,
even to the naked eye, as depicted in Figure 11. This affected images provided by both
sensors. When comparing the performance of both sensors separately, it is possible to
conclude that, for Flight Experiment A, both precision and recall, and also frame rate, are
generally better for the EO sensor. For Flight Experiment C, these results are more variable
depending on the range. In terms of range, the performance is better for the medium
and far ranges than for the close range. In particular, for Flight Experiment C, the recall
obtained using the IR sensor for the close range is lower because in most frames the UAV is
closer than the focal distance of the sensor, hence appearing very blurry in the images. For
the very far range in Flight Experiment C, both models underperform. This leads to the
conclusion that this range limits the system.

(a) (b)
Figure 11. UAV recorded at twilight: (a) EO image. (b) IR image.

As for the average number of IDSs per 100 frames, it is generally lower for the EO
sensor than for the IR. Visual analysis of the output videos with IDs led to the conclusion
that there are mainly three reasons for the missed tracks of the tracker. Firstly, there is the
case when the UAV is not detected at all, leading to FNs. Here, if it happens in consecutive
frames and the UAV is lost from the list of tracks, the tracker assigns it a new ID. Secondly,
the tracker incorrectly assigns an ID to an object when it is detected in a series of successive
frames, such as when there are FPs on birds that follow a trajectory. Finally, abrupt camera
movements can negatively impact the tracker performance, even when subtle. This effect
is more significant at longer ranges because the predicted bounding box is smaller, and so
the probability of camera movements leading to a miss in overlap of consecutive bounding
boxes is higher.
As for the frame rate, the system is able to process from 97.6 to 117.0 frames every
second. This is a relative value that highly depends on the hardware used.

4.3.1. ECC Algorithm

The inclusion of the ECC algorithm in the system, considering translation as the motion
model, to attempt compensating for camera motion, showed a dependency on feature-rich
and textured backgrounds, for both image types. The transformation of the images and
bounding boxes to the frame of reference was successful when the UAV was below the
local horizon, with more details and an abundance of motionless objects, or at a close
range. At closer range, however, the effect of camera movement was less noticeable due
to the higher bounding box size that led to a higher likelihood of overlap for consecutive
frames. In fact, analysis of the output videos of the independent models showed that for
this case the tracker failed more often due to the FP and FN detections. On the other hand,
Drones 2024, 8, 650 17 of 25

the performance deteriorated for images with the sky as background or moving objects,
especially when the UAV was flying at a long range. In both cases, the algorithm increased
the system frame rate.
Given the mentioned reasons, the algorithm was not considered beneficial for the
present study and further tests do not include its application. It is important to emphasize
that implementations of the ECC algorithm have shown acceptable results and improve-
ments in the tracker performance, namely in the project developed in [26]. Although it was
discarded for the present work, it is still regarded as valuable tool for image alignment,
and its implementation worthwhile for different contexts. Therefore, the ECC should be
re-tested for an online implementation of the system onboard an aircraft, which is more
susceptible to sudden camera movements that cannot be filtered out.

4.3.2. Target Operational Conditions

Finally, the target cases mentioned were analysed separately.
The results showed both sensors managed to accurately make predictions of the
UAV when it is blurry or partially cut, and so independent models showed a high system
robustness in these situations. One example for each is depicted in Figure 12.

(a) (b)

Figure 12. Independent model detection and tracking on higher robustness target cases: (a) EO
blurry UAV image. (b) IR blurry UAV image. (c) EO partially cut UAV image. (d) IR partially cut
UAV image.

As for intra-class variation, some conclusions were drawn from the analysis of the
output video segments of the particular case of the aircraft that was not included in the
training data, for independent models. In general, precision remains similar, which means
the number of FPs was not highly impacted, as expected. However, for recall, the detector
fails more often in predicting a UAV that was not included in the labelled dataset. However,
in most cases, it makes a successful detection, but with a lower confidence score.
One of the most frequent problems that object detectors face is the existence of similar
objects in the images that do not belong to the class being detected. In the case of UAV
detection, birds are the main concern. The analysis of this case on the independent models
Drones 2024, 8, 650 18 of 25

showed that there was a decrease in average precision for both models that did not depend
on UAV range, even though the value for recall remained similar to the average.
Finally, there was a decrease in average recall for the textured background case, and the
performance of the detector was much better when the UAV was above the local horizon,
having the sky as the immediate background. Even so, the case where the IR model made a
detection but the EO model failed was more common because the IR signature of the UAV
was highlighted against the background. Conversely, the EO images had more detail at a
closer range and so the detection was more likely.
Thus, the particularly relevant cases to analyse with the implementation of the data
fusion methodologies are the intra-class variation, presence of birds, and textured back-
ground scenarios, due to the lower robustness the independent models presented. One
example for each of these scenarios is depicted in Figure 13. Both in the intra-class and
textured background target cases, the figure shows an example with a successful detection
in the EO sensor and a FN in the IR sensor, which the data fusion methodologies aim to
eliminate. For the presence of birds case, both sensors have a FP detection on a bird besides
the UAV.

(a) (b)

(e) (f)

Figure 13. Independent model detection and tracking on lower robustness target cases: (a) EO
intra-class variation image. (b) IR intra-class variation image. (c) EO presence of birds image. (d) IR
presence of birds image. (e) EO textured background image. (f) IR textured background image.
Drones 2024, 8, 650 19 of 25

4.4. Data Fusion Architecture Testing

The implementation of the decision- and pixel-level methodologies focuses on the
improvement in precision and recall when compared to the independent models. It is
important to note that the video segments from Flight Experiment A were not suitable to
test the pixel-level data fusion architecture due to often degraded video alignment.
Table 5 shows the average results of the detector and tracker for Flight Experiments
A and C. Here, for the decision-level architecture, the designation EO-IR refers to the
case that has the EO model as the head model and the IR model is used for confirmation,
and vice versa. In the pixel-level data fusion case, only tested for Flight Experiment C,
the comparison is made with both the EO and IR independent models separately, in the
format EO|IR.

Table 5. Data fusion testing on the detector and tracker for Flight Experiments A and C.

Precision Recall
Frame Rate IDS per
FE Data Range Precision Variation Recall Variation
(fps) 100 Frames
(%) (%)
close 0.999 +4.3 0.979 −0.4 91.1 0.000
A EO-IR medium 0.999 +3.7 0.988 −0.7 93.1 0.494
far 0.996 +5.9 0.951 +7.3 82.2 3.885
close 0.992 +5.2 0.979 +0.3 89.8 0.395
A IR-EO medium 0.997 +3.7 0.989 +2.6 92.4 0.681
far 0.992 +8.7 0.952 −0.3 84.1 3.007
close 0.999 +4.9 0.634 −0.4 81.5 0.679
medium 0.999 +1.3 0.808 +3.8 87.2 0.427
C EO-IR
far 0.995 +0.9 0.719 +8.4 78.9 1.662
v. far 0.994 +6.1 0.182 +7.1 76.1 1.180
close 0.995 +3.1 0.572 +11.7 77.1 0.679
medium 0.992 +4.9 0.822 +4.9 87.0 0.532
C IR-EO
far 0.989 +7.0 0.752 +3.5 81.7 1.431
v. far 0.975 +4.7 0.241 −0.4 75.8 0.600
close 0.943 −0.70|−2.10 0.429 −21.0|−2.70 113.4 4.253
medium 0.940 −4.50|+0.10 0.577 −16.5|−16.9 107.4 5.219
C Pixel Fused
far 0.954 −3.30|+1.70 0.413 −15.8|−15.5 113.7 2.190
v. far 0.984 +5.20|+5.60 0.123 +1.30|−12.1 117.1 0.397

For the decision-level architecture, precision and recall increase on average 3.9% and
3.6%, respectively, for the EO-IR configuration, that is, when the EO results are occasionally
complemented by the IR results, when compared to the EO independent model. As
for the IR-EO configuration, there is an average increase in precision of 5.3% and recall
of 3.2% when compared to the IR independent model. The average lower results for
precision and recall for Flight Experiment C are due to the fact that the independent models
also have lower results for this data, and it does not directly mean that the algorithm is
underperforming. In fact, in terms of percentages, the improvements that the algorithm
manages to accomplish are similar between flight experiments. In some cases, even though
there is a significant increase in precision, it comes at the cost of a reduction in recall. This
compromise might or might not be worthwhile depending on the system requirements.
One limitation of this architecture is that it is always conditioned by the performance of the
sensors independently. This influences mainly recall because if, for instance, both cameras
happen to have a FN, the system with the decision-level data fusion will preserve the FN
and keep recall unchanged.
For the pixel-level approach, the results for precision and recall underperform when
compared to both decision-level architectures for all ranges and, in most cases, do not
show improvements when compared to the use of each sensor independently. The main
factors influencing the results are the lighting conditions that were experienced during
Drones 2024, 8, 650 20 of 25

Flight Experiment C, that led the fusion to produce images that appear distinct from the
ones featured in the dataset, and the imperfect video alignment. In fact, it is possible to
conclude from the analysis of the output videos that the performance of the model is worse
in the frames of the video segments when the alignment starts to fail, mostly in terms of
recall, such as the one illustrated in Figure 14a. Evidently, the longer the range of the UAV,
the more significant the impact of a failure in alignment is. As can be seen in the example
in Figure 14b, in some of the video segments, a failure in alignment can cause a complete
miss in overlap of the UAV from the two sensors. Here, the models frequently produced
FPs by detecting two UAVs, as opposed to FN detections.

(a) (b)
Figure 14. Alignment failure on Pixel Fused images: (a) Vertical shift of input images to Fusion-
GAN. (b) Significant vertical shift of input images leading to complete UAV overlap miss on Pixel
Fused images.

As for the number of IDSs per 100 frames, for the decision-level architectures, on
average, this value is reduced in all cases presented, namely due to the reduction of FPs.
For the pixel-level architecture, no pattern is verified for the number of IDSs per 100 frames
since precision and recall also either suffer an increase or decrease.
For frame rate, since the decision-level algorithm only resorts to the confirmation
model if necessary, the average only drops from 6.5 to 34.3 fps for the close, medium, and
far ranges, despite presenting a high variance. For the pixel-level fused models, when
compared to the independent models, there is a lower processing time. However, this is
not taking into account the fusion time.

Target Operational Conditions

In terms of target conditions, the results for the data fusion architectures for the
isolated conditions with low system robustness on independent models are studied.
First, the intra-class variation case contributes to decreasing the average recall of
the decision-level architecture, that is, this case has lower recall than the average. Even
so, compared with the independent models, these architectures show a more significant
improvement. In fact, the average recall has an increase of 4.1% for the EO-IR configuration
and of 4.2% for the IR-EO configuration. With the exception of the very far range limit of
the system, the decision-level architecture shows robustness to intra-class variation. For the
pixel-level data fusion, in the intra-class variation scenario, the results are comparable to the
average, and the system still underperforms for recall. Figure 15 shows the corresponding
images to Figure 13a,b, with the application of the data fusion methodologies.
Drones 2024, 8, 650 21 of 25

(a) (b) (c)

Figure 15. Data fusion detection and tracking on the intra-class variation target case: (a) EO-IR
architecture. (b) IR-EO architecture. (c) Pixel-level fused architecture.

One of the independent models fails at detection of the UAV in the intra-class variation
scenario, but the decision-level data fusion algorithm manages to make the detection in
both cases and recall is improved. In this example, the pixel-level data fusion misses the
detection. This specific case was found to be common in the tests.
In the presence of birds, the decision-level data fusion algorithm successfully manages
to improve precision. On average, this metric was improved by about 8.2% and 11.3% for
the EO-IR and IR-EO configurations, respectively. Since this scenario is only significant for
Flight Experiment A, the pixel-level architecture could not be tested for this case. Figure 16
shows the application of the decision-level data fusion architectures on the same example
as in Figure 13c,d, where both the independent models identify a bird as a UAV. As can be
seen in the images, these FPs are eliminated by the decision-level data fusion algorithm,
both having the EO or IR sensor as the head sensor.

(a) (b)
Figure 16. Data fusion detection and tracking with the presence of birds target case: (a) EO-IR
architecture. (b) IR-EO architecture.

Finally, for the textured background scenario, there are predominantly two reasons
affecting the results. First, when the range increases and the UAV size is limited to fewer
pixels, if the background is textured, the UAV becomes more easily mistaken by it. This
was also experienced by the naked eye during the flight experiments. Secondly, the lighting
condition of the scenario causes a decrease in recall, especially for the twilight videos
from Flight Experiment C. Besides these factors, the decision-level architecture manages
to decrease the number of FNs and hence improve recall, and increase precision, often
significantly. The recall is improved, on average, by about 5.2% and 6.9% for the EO-IR and
IR-EO configurations, when compared to the EO and IR independent models, respectively.
However, the performance results of the detector using the IR model are not as superior as
expected. In fact, one of the main reasons to choose the use of the IR sensor was its ability
to highlight the UAV in scenarios where it is easily mistaken for background through an EO
camera, and imperceptible to the naked eye. Furthermore, during the flight experiments,
Drones 2024, 8, 650 22 of 25

when the UAV was at a long range and flying below the local horizon, it was only possible
for the operator of the sensors to visually detect it through the IR sensor. Given this, the
lower recall can be due to mainly two factors. Firstly, the IR images are more granular
and not as sharp, when compared to the EO ones, which means that, at a long range,
regardless of the background, the detection task becomes more challenging. Secondly, the
results can also be conditioned by the quality of the dataset itself. Even though it is capable
of generalizing, since it did not include as many scenarios with textured background as
desired, recall in this scenario suffered a decrease. For the pixel-level architecture, no
consistent improvement is observed, even though some cases show significant increases
in precision and recall. Figure 17 shows the same frames as Figure 13e,f, but with the
implementation of the data fusion methodologies. In the decision-level data fusion cases,
the UAV is detected, even though one of the independent models fails to detect it. However,
the pixel level often fails to detect the aircraft with a textured background, which is in
accordance with the low values for recall shown in Table 5.

(a) (b) (c)

Figure 17. Data fusion detection and tracking on the textured background target case: (a) EO-IR
architecture. (b) IR-EO architecture. (c) Pixel-level fused architecture.

5. Conclusions
In this project, a detection and tracking system for small UAVs using an EO sensor and
an IR sensor was developed, and a comparison of the use of these sensors independently
with two data fusion methodologies was performed. To this end, flight experiments were
conducted for data collection. As a result, additional contributions of this project are
datasets of spatially and temporally aligned EO and IR data, one with labelled images
of UAVs, and one with UAV videos, not labelled. Finally, the system was evaluated for
different operational scenarios and target conditions, and tested using the flight test data
experimentally collected.
First, YOLOv7 tests were performed for five variations of the labelled dataset: EO
images, IR images, IR images with image bias removed, pixel-level fused images, and
pixel-level fused images with image bias removed. Similar results for the performance
metrics were obtained, achieving an average precision of 0.884, average recall of 0.854,
average [email protected] of 0.885, and average mAP@[.5:.95] of 0.627. The dataset variations that
had the bias removed were discarded since no significant improvements were observed.
Next, the detection and tracking tests were conducted, with the addition of the Byte-
Track tracker to the system, on the inference dataset, using the independent EO and IR
models to benchmark the performance of the sensors. Average results were presented for
different ranges, and target conditions were identified. Both sensors exhibited acceptable
performance in the blurry UAV and partially cut UAV scenarios, but precision suffered a
decrease in the presence of birds, and recall suffered a decrease in the intra-class variation
and textured background scenarios. Both decision-level and pixel-level data fusion method-
ologies were tested for the same video segments and target conditions. For the presence
of birds case, the decision-level architecture showed significant improvements in preci-
sion, and thus tracker performance, at the cost of a decrease in frame rate. For intra-class
Drones 2024, 8, 650 23 of 25

variation, the decision-level architecture showed improvements for precision, recall, and
tracker performance, but the pixel-level architecture under-performed, in general, when
compared to both independent models and the decision-level architecture. For the textured
background case, both precision and recall presented significant improvements with the
application of the decision-level architecture, as opposed to the pixel-level architecture
which was not considered beneficial.
To sum up, in general, the decision-level data fusion architecture showed the best
performance, and its use proved to be promising. Furthermore, there is potential for opti-
mization and enhancement in the implementation of this algorithm. Nevertheless, there is
a compromise between increasing precision and recall, and improving tracker performance,
with the decrease in frame rate. For this reason, the selection of the architecture depends
on each C-UAS system and its goals and requirements, and must be thoroughly considered.
The use of each sensor independently may also be beneficial for some scenarios. As for the
pixel-level architecture, even though in this study it showed poor results and, in general, it
is not considered advantageous, better equipment for a more accurate image alignment,
and the use of other fusion algorithms may lead to improvement in this methodology.
The conclusions drawn on this proof of concept research, by comparing architectures
for the EO and IR sensors, and data fusion methodologies, are a contribution to detection
and tracking tasks, and a basis for future work on C-UASs.

Author Contributions: Conceptualization, A.P., S.W., A.M. and A.S.; methodology, A.P.; investigation,
A.P.; resources, A.P. and S.W.; writing—original draft preparation, A.P.; writing—review and editing,
A.P., S.W., A.M. and A.S.; supervision, A.M. and A.S.; project administration, A.S.; funding acquisition,
A.S. All authors have read and agreed to the published version of the manuscript.
Funding: This work has been partially funded by Fundação para a Ciência e a Tecnologia (FCT)
under project LAETA Base Funding (https://doi.org/10.54499/UIDB/50022/2020). A.S. is grateful
for the NSERC Discovery and Canada Research Chair Programs.
Data Availability Statement: The original data created in the study are openly available in Mendeley
Data at https://doi.org/10.17632/sn9vy5c8sm.1.
Conflicts of Interest: The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript:

C-UAS Counter-UAS
CNN Convolutional Neural Network
DL Deep Learning
EO Electro-optical
FE Flight Experiment
FN False Negative
FP False Positive
GAN Generative Adversarial Network
IDS Identification Switch
IR Infrared
[email protected] Mean Average Precision at 0.5
mAP@[.5:.95] Mean Average Precision at [.5:.95]
MLP Multilayer Perceptron
RF Radio Frequency
UAS Unmanned Aerial System
UAV Unmanned Aerial Vehicle
UVIC-CfAR University of Victoria’s Center for Aerospace Research
Drones 2024, 8, 650 24 of 25

References
1. Worldwide Drone Incidents. Available online: https://www.dedrone.com/resources/incidents-new/all (accessed on 19 Jan-
uary 2024)
2. Castrillo, V.U.; Manco, A.; Pascarella, D.; Gigante, G. A Review of Counter-UAS Technologies for Cooperative Defensive Teams of
Drones. Drones 2022, 6, 65. [CrossRef]
3. Park, S.; Kim, H.T.; Lee, S.; Joo, H.; Kim, H. Survey on Anti-Drone Systems: Components, Designs, and Challenges. IEEE Access
2021, 9, 42635–42659. [CrossRef]
4. Wang, J.; Liu, Y.; Song, H. Counter-Unmanned Aircraft System(s) (C-UAS): State of the Art, Challenges, and Future Trends. IEEE
Aerosp. Electron. Syst. Mag. 2021, 36, 4–29. [CrossRef]
5. Wang, B.; Li, Q.; Mao, Q.; Wang, J.; Chen, C.L.P.; Shangguan, A.; Zhang, H. A Survey on Vision-Based Anti Unmanned Aerial
Vehicles Methods. Drones 2024, 8, 518. [CrossRef]
6. Sun, Y.; Abeywickrama, S.; Jayasinghe, L.; Yuen, C.; Chen, J.; Zhang, M. Micro-Doppler Signature-Based Detection, Classification,
and Localization of Small UAV with Long Short-Term Memory Neural Network. IEEE Trans. Geosci. Remote Sens. 2021, 59,
6285–6300. [CrossRef]
7. Passafiume, M.; Rojhani, N.; Collodi, G.; Cidronali, A. Modeling small UAV micro-doppler signature using millimeter-wave
FMCW radar. Electronics 2021, 10, 747. [CrossRef]
8. Yan, J.; Hu, H.; Gong, J.; Kong, D.; Li, D. Exploring Radar Micro-Doppler Signatures for Recognition of Drone Types. Drones 2021,
7, 280. [CrossRef]
9. Dogru, S.; Marques, L. Drone Detection Using Sparse Lidar Measurements. IEEE Robot. Autom. Lett. 2022, 7, 3062–3069. [CrossRef]
10. Nelega, R.; Belean, B.; Valeriu, R.; Turcu, F.; Puschita, E. Radio Frequency-Based Drone Detection and Classification using Deep
Learning Algorithms. In Proceedings of the 2023 International Conference on Software, Telecommunications and Computer
Networks (SoftCOM), Split, Croatia, 21–23 September 2023.
11. Fu, Y.; He, Z. Radio Frequency Signal-Based Drone Classification with Frequency Domain Gramian Angular Field and Convolu-
tional Neural Network. Drones 2024, 8, 511. [CrossRef]
12. Shi, Z.; Chang, X.; Yang, C.; Wu, Z.; Wu, J. An Acoustic-Based Surveillance System for Amateur Drones Detection and Localization.
IEEE Trans. Veh. Technol. 2020, 69, 2731–2739. [CrossRef]
13. Ahmed, C.A.; Batool, F.; Haider, W.; Asad, M.; Raza Hamdani, S.H. Acoustic Based Drone Detection Via Machine Learning. In
Proceedings of the 2022 International Conference on IT and Industrial Technologies (ICIT), Shanghai, China, 28–31 March 2022.
14. Zhao, J.; Zhang, J.; Li, D.; Wang, D. Vision-Based Anti-UAV Detection and Tracking. IEEE Trans. Intell. Transp. Syst. 2022, 23,
25323–25334. [CrossRef]
15. Ghosh, S.; Patrikar, J.; Moon, B.; Hamidi, M.M.; Scherer, S. AirTrack: Onboard Deep Learning Framework for Long-Range Aircraft
Detection and Tracking. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London,
UK, 29 May–2 June 2023.
16. Coluccia, A.; Fascista, A.; Schumann, A.; Sommer, L.; Dimou, A.; Zarpalas, D.; Méndez, M.; de la Iglesia, D.; González, I.;
Mercier, J.P.; et al. Drone vs. Bird detection: Deep learning algorithms and results from a grand challenge. Sensors 2021, 21, 2824.
[CrossRef] [PubMed]
17. Ding, L.; Xu, X.; Cao, Y.; Zhai, G.; Yang, F.; Qian, L. Detection and tracking of infrared small target by jointly using SSD and
pipeline filter. Digit. Signal Process. Rev. J. 2021, 110, 102949. [CrossRef]
18. Fang, H.; Ding, L.; Wang, L.; Chang, Y.; Yan, L.; Han, J. Infrared Small UAV Target Detection Based on Depthwise Separable Resid-
ual Dense Network and Multiscale Feature Fusion. IEEE Trans. Instrum. Meas. 2022, 71, 1–20.
tim.2022.3198490. [CrossRef]
19. Wu, X.; Li, W.; Hong, D.; Tao, R.; Du, Q. Deep Learning for Unmanned Aerial Vehicle-Based Object Detection and Tracking: A
survey. IEEE Geosci. Remote Sens. Mag. 2022, 10, 91–124. [CrossRef]
20. Svanström, F.; Alonso-Fernandez, F.; Englund, C. Drone Detection and Tracking in Real-Time by Fusion of Different Sensing
Modalities. Drones 2022, 6, 317. [CrossRef]
21. Alldieck, T.; Bahnsen, C.H.; Moeslund, T.B. Context-aware fusion of RGB and thermal imagery for traffic monitoring. Sensors
2016, 16, 1947. [CrossRef] [PubMed]
22. Yang, L.; Ma, R.; Zakhor, A. Drone Object Detection Using RGB/IR Fusion. In Proceedings of the Symposium on Electronic
Imaging: Computational Imaging XX, Online, 17–20 January 2022.
23. Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object
detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC,
Canada, 18–22 June 2023.
24. Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. YOLOv7: ByteTrack: Multi-Object Tracking by
Associating Every Detection Box. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27
October 2022.
25. Evangelidis, G.D.; Psarakis, E.Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans.
Pattern Anal. Mach. Intell. 2008, 30, 1858–1865. [CrossRef] [PubMed]
26. Lopes, J.P.D.; Suleman, A.; Figueiredo, M.A.T. Detection and Tracking of Non-Cooperative UAVs: A Deep Learning Moving-Object
Tracking Approach. MsC Thesis, Instituto Superior Técnico, Lisbon, Portugal, 2022.
Drones 2024, 8, 650 25 of 25

27. Sun, C.; Zhang, C.; Xiong, N. Infrared and visible image fusion techniques based on deep learning: A review. Electronics 2020, 9,
2162. [CrossRef]
28. Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf.
Fusion 2019, 48, 11–26. [CrossRef]
29. Szeliski, R. Computer Vision: Algorithms and Applications, 2nd ed.; Springer: New York, NY, USA, 2021; pp. 33–96.
30. Pedro, S.; Tomás, D.; Vale, J.L.; Suleman, A. Design and performance quantification of VTOL systems for a canard aircraft.
Aeronaut. J. 2021, 125, 1768–1791. [CrossRef]
31. Castellani, N.; Pedrosa, F.; Matlock, J.; Mazur, A.; Lowczycki, K.; Widera, P.; Zawadzki, K.; Lipka, K.; Suleman, A. Development
of a Series Hybrid Multirotor. In Proceedings of the 13th EASN International Conference on Innovation in Aviation & Space for
opening New Horizons, Salerno, Italy, 5–8 September 2023.
32. Zheng, Y.; Lin, S.; Kambhamettu, C.; Yu, J.; Kang, S.B. Single-Image Vignetting Correction. IEEE Trans. Pattern Anal. Mach. Intell.
2009, 31, 2243–2256. [CrossRef] [PubMed]
33. Zheng, Y.; Grossman, M.; Awate, S.; Gee, J. Automatic Correction of Intensity Nonuniformity From Sparseness of Gradient
Distribution in Medical Images. In Proceedings of the 12th International Conference on Medical Image Computing and Computer
Assisted Intervention, London, UK, 20–24 September 2009.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

SAP BDEx Config Guide
100% (2)
SAP BDEx Config Guide
101 pages
Detection of UAS Paper
No ratings yet
Detection of UAS Paper
31 pages
Coursera Certificate JavaScript Jquery and JSON
No ratings yet
Coursera Certificate JavaScript Jquery and JSON
1 page
Sensors 23 07650
No ratings yet
Sensors 23 07650
24 pages
Drone
No ratings yet
Drone
102 pages
Light Burn Docs
No ratings yet
Light Burn Docs
187 pages
User's Manual User's Manual
No ratings yet
User's Manual User's Manual
128 pages
A Survey of Computer Vision Methods For
No ratings yet
A Survey of Computer Vision Methods For
38 pages
Shafiq Khurram 2023 Thesis
No ratings yet
Shafiq Khurram 2023 Thesis
146 pages
From Stationary To Nonstationary UAVs Deep-Learnin
No ratings yet
From Stationary To Nonstationary UAVs Deep-Learnin
13 pages
Drones 07 00117 v2
No ratings yet
Drones 07 00117 v2
18 pages
A5E01428618-03 SITRANS CV en en-US
No ratings yet
A5E01428618-03 SITRANS CV en en-US
114 pages
Vision-Based Anti-UAV Detection and Tracking: Jie Zhao, Jingshu Zhang, Dongdong Li, Dong Wang
No ratings yet
Vision-Based Anti-UAV Detection and Tracking: Jie Zhao, Jingshu Zhang, Dongdong Li, Dong Wang
12 pages
Remotesensing 16 00879 v2
No ratings yet
Remotesensing 16 00879 v2
42 pages
Applsci 13 05409
No ratings yet
Applsci 13 05409
20 pages
Drones 07 00095
No ratings yet
Drones 07 00095
17 pages
Electronics: Fastuav-Net: A Multi-Uav Detection Algorithm For Embedded Platforms
No ratings yet
Electronics: Fastuav-Net: A Multi-Uav Detection Algorithm For Embedded Platforms
19 pages
Drones in Action A Comprehensive Analysis of Drone
No ratings yet
Drones in Action A Comprehensive Analysis of Drone
22 pages
RF-Enabled Deep-Learning-Assisted Drone Detection and Identification - An End-To-End Approach
No ratings yet
RF-Enabled Deep-Learning-Assisted Drone Detection and Identification - An End-To-End Approach
18 pages
Robust Low-Cost Drone Detection and Classification
No ratings yet
Robust Low-Cost Drone Detection and Classification
11 pages
NPM Administrator Guide
No ratings yet
NPM Administrator Guide
183 pages
Drones 07 00174 With Cover
No ratings yet
Drones 07 00174 With Cover
21 pages
UAv Detection Complex Backgrounds and Rainy Conditions
No ratings yet
UAv Detection Complex Backgrounds and Rainy Conditions
9 pages
Signals 04 00018
No ratings yet
Signals 04 00018
22 pages
Molchanov 2014
No ratings yet
Molchanov 2014
10 pages
System Namierzania Punktowego Sygnatury Termicznej
No ratings yet
System Namierzania Punktowego Sygnatury Termicznej
16 pages
RF-UAVNet High-Performance Convolutional Network For RF-Based Drone Surveillance Systems
No ratings yet
RF-UAVNet High-Performance Convolutional Network For RF-Based Drone Surveillance Systems
12 pages
AV FDTI - Audio Visualfusion For Dronethreatidentification
No ratings yet
AV FDTI - Audio Visualfusion For Dronethreatidentification
8 pages
Classification of UAV-to-Ground Targets Based On Enhanced Micro-Doppler Features Extracted Via PCA and Compressed Sensing
No ratings yet
Classification of UAV-to-Ground Targets Based On Enhanced Micro-Doppler Features Extracted Via PCA and Compressed Sensing
9 pages
MIcro Doppler Classification
No ratings yet
MIcro Doppler Classification
17 pages
Applsci 13 11320
No ratings yet
Applsci 13 11320
27 pages
Sensors 23 07650 With Cover
No ratings yet
Sensors 23 07650 With Cover
25 pages
Sensors 23 07190
No ratings yet
Sensors 23 07190
27 pages
Sei Daliy Eva 2020
No ratings yet
Sei Daliy Eva 2020
19 pages
UAV Target Detection Algorithm Based On Improved YOLOv8
No ratings yet
UAV Target Detection Algorithm Based On Improved YOLOv8
11 pages
Gro Chol Sky 2006
No ratings yet
Gro Chol Sky 2006
11 pages
Sensors 22 03896 v2
No ratings yet
Sensors 22 03896 v2
30 pages
Drones 07 00304 v2
No ratings yet
Drones 07 00304 v2
26 pages
An Improved Unauthorized Unmanned Aerial Vehicle Detection Algorithm Using Radiofrequency-Based Statistical Fingerprint Analysis
No ratings yet
An Improved Unauthorized Unmanned Aerial Vehicle Detection Algorithm Using Radiofrequency-Based Statistical Fingerprint Analysis
22 pages
Robust Low-Cost Drone Detection and Classification Using Convolutional Neural Networks in Low SNR Environments
No ratings yet
Robust Low-Cost Drone Detection and Classification Using Convolutional Neural Networks in Low SNR Environments
10 pages
UAV Sensors For Environmental Monitoring PDF
No ratings yet
UAV Sensors For Environmental Monitoring PDF
672 pages
Sensors
No ratings yet
Sensors
22 pages
Air TO Air UAV DETECTION
No ratings yet
Air TO Air UAV DETECTION
3 pages
An Improved YOLOv5 Method For Small Object
No ratings yet
An Improved YOLOv5 Method For Small Object
10 pages
Remotesensing 12 03035
No ratings yet
Remotesensing 12 03035
28 pages
Nam Joshi 2017 Unmanned Aerial Vehicle Localization Using Distributed Sensors
No ratings yet
Nam Joshi 2017 Unmanned Aerial Vehicle Localization Using Distributed Sensors
8 pages
YOLO Model-Based Target Detection Algorithm For UA
No ratings yet
YOLO Model-Based Target Detection Algorithm For UA
5 pages
Echoformer Transformer Architecture Based On Radar Echo Characteristics For UAV Detection
No ratings yet
Echoformer Transformer Architecture Based On Radar Echo Characteristics For UAV Detection
15 pages
Electronics 12 03664
No ratings yet
Electronics 12 03664
21 pages
UAV Visual Detection and Tracking Using Deep Neural Networks Conf Paper
No ratings yet
UAV Visual Detection and Tracking Using Deep Neural Networks Conf Paper
11 pages
Instagram-Social Media Image Sizes
No ratings yet
Instagram-Social Media Image Sizes
34 pages
Vision-Based UAV Detection and Tracking Using Deep Learning and Kalman Filterr
No ratings yet
Vision-Based UAV Detection and Tracking Using Deep Learning and Kalman Filterr
13 pages
Chen Et Al 2020 Low Altitude Protection Technology of Anti Uavs Based On Multisource Detection Information Fusion
No ratings yet
Chen Et Al 2020 Low Altitude Protection Technology of Anti Uavs Based On Multisource Detection Information Fusion
12 pages
Lightweight Air To Air Unmanned
No ratings yet
Lightweight Air To Air Unmanned
18 pages
Air-to-Air Visual Detection of Micro-UAVs An Experimental Evaluation of Deep Learning
No ratings yet
Air-to-Air Visual Detection of Micro-UAVs An Experimental Evaluation of Deep Learning
8 pages
Detection Algorithm For Detecting Dronesuavs
No ratings yet
Detection Algorithm For Detecting Dronesuavs
12 pages
2504.11967v22007.07396v2chen Et Al 2020 Low Altitude Protection Technology of Anti Uavs Based On Multisource Detection Information Fusion
No ratings yet
2504.11967v22007.07396v2chen Et Al 2020 Low Altitude Protection Technology of Anti Uavs Based On Multisource Detection Information Fusion
15 pages
Implementation of Detection and Tracking Mechanism For Small UAS
No ratings yet
Implementation of Detection and Tracking Mechanism For Small UAS
7 pages
251 - Combined - RF-Based - Drone - Detection - and - Classification
No ratings yet
251 - Combined - RF-Based - Drone - Detection - and - Classification
10 pages
A Real-Time UAV Object Detection System Design With FPGA Implementation
No ratings yet
A Real-Time UAV Object Detection System Design With FPGA Implementation
6 pages
PDF - Object Detection and Person Tracking Using Uav
No ratings yet
PDF - Object Detection and Person Tracking Using Uav
11 pages
Electronics 12 03141
No ratings yet
Electronics 12 03141
14 pages
Miniature UAV Radar System
No ratings yet
Miniature UAV Radar System
24 pages
Valuejet 1604
No ratings yet
Valuejet 1604
456 pages
Drone Detection Using Visual Analysis
No ratings yet
Drone Detection Using Visual Analysis
2 pages
Dokumen - Tips - ZXMP m820 Product Description Liberty Port 3112 Alarm Input Output Function
No ratings yet
Dokumen - Tips - ZXMP m820 Product Description Liberty Port 3112 Alarm Input Output Function
218 pages
Microprocessors Notes
No ratings yet
Microprocessors Notes
73 pages
DOC001152125
No ratings yet
DOC001152125
10 pages
LWUAVDet A Lightweight UAV Object Detection Network On Edge Devices
No ratings yet
LWUAVDet A Lightweight UAV Object Detection Network On Edge Devices
11 pages
Uropean Curriculum Vitae Format: Ersonal Information
No ratings yet
Uropean Curriculum Vitae Format: Ersonal Information
6 pages
Final Examination - Spring 2021 Semester Sajid Ali - 40760: Faculty of Engineering, Sciences and Technology
No ratings yet
Final Examination - Spring 2021 Semester Sajid Ali - 40760: Faculty of Engineering, Sciences and Technology
4 pages
Battlefy Player Guide
No ratings yet
Battlefy Player Guide
73 pages
CS 4038 - DM Course Outline (Fall 2021)
No ratings yet
CS 4038 - DM Course Outline (Fall 2021)
4 pages
DSGDSGDSG
No ratings yet
DSGDSGDSG
1 page
Sasvinaa Kandasamy (DSTR Final TP053388)
No ratings yet
Sasvinaa Kandasamy (DSTR Final TP053388)
34 pages
Dumpkiller: Latest It Exam Questions & Answers
No ratings yet
Dumpkiller: Latest It Exam Questions & Answers
8 pages
Case Study 2
No ratings yet
Case Study 2
4 pages
Database Systems A Pragmatic Approach 2nd Edition Elvis C. Foster Shripad Godbol Download PDF
No ratings yet
Database Systems A Pragmatic Approach 2nd Edition Elvis C. Foster Shripad Godbol Download PDF
57 pages
File Converter - Requirements
No ratings yet
File Converter - Requirements
6 pages
DFI Boards Catalog 2021 R2 - 210705 - 1028 - Web
No ratings yet
DFI Boards Catalog 2021 R2 - 210705 - 1028 - Web
32 pages
Slide of PSpice
No ratings yet
Slide of PSpice
63 pages
Docker Private Registry
No ratings yet
Docker Private Registry
4 pages
An Overview of Electronics and Communication
No ratings yet
An Overview of Electronics and Communication
18 pages
Name: - Date: - APCSA - Loops Worksheet 1
No ratings yet
Name: - Date: - APCSA - Loops Worksheet 1
2 pages
G9 Revision Work Sheet
No ratings yet
G9 Revision Work Sheet
3 pages
L310 Start Here Guide
No ratings yet
L310 Start Here Guide
4 pages
HollySys - Introduction V1.2 - 2021
No ratings yet
HollySys - Introduction V1.2 - 2021
42 pages
Multispectral Imaging: Unlocking the Spectrum: Advancements in Computer Vision
From Everand
Multispectral Imaging: Unlocking the Spectrum: Advancements in Computer Vision
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Geoffrey Riggs: Memorandum
No ratings yet
Geoffrey Riggs: Memorandum
7 pages

Infrared and Visible Camera Integration For Detection

Uploaded by

Infrared and Visible Camera Integration For Detection

Uploaded by

drones

1 Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal; [email protected]

Keywords: YOLOv7; ByteTrack; electro-optical sensor; infrared sensor; data fusion

Citation: Pereira, A.; Warwick, S.;

Drones 2024, 8, 650. https://doi.org/10.3390/drones8110650 https://www.mdpi.com/journal/drones

2. Detection and Tracking Architecture

Figure 1. Decision-level data fusion stages.

Figure 2. Pixel-level data fusion stages.

2.2. Data Fusion Methodology

(a) (b) (c)

3.1. Experimental Work

Figure 4. Experimental setup with highlight on the sensors.

Table 1. Sensor calibration results.

Sensor Calibration Matrix Distortion Coefficients

(a) (b) (c)

(d) (e) (f)

(a) (b) (c)

3.2. Labelled Dataset

(a) (b) (c)

(a) (b) (c)

(d) (e) (f)

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

3.3. Inference Dataset

4. Results and Discussion

4.1. Detector on Labelled Dataset

Table 2. YOLOv7 test set results for 500 epochs.

Model Precision Recall [email protected] mAP@[.5:.95]

Table 3. YOLOv7-tiny test set results for 300 epochs.

4.3. Independent Model Testing

Nr. of Frame Rate IDS per

4.3.1. ECC Algorithm

4.3.2. Target Operational Conditions

4.4. Data Fusion Architecture Testing

Target Operational Conditions

(a) (b) (c)

(a) (b) (c)

You might also like