Infrared and Visible Camera Integration For Detection
Infrared and Visible Camera Integration For Detection
Article
Infrared and Visible Camera Integration for Detection and
Tracking of Small UAVs: Systematic Evaluation
Ana Pereira 1 , Stephen Warwick 2 , Alexandra Moutinho 3 and Afzal Suleman 2,3, *
Abstract: Given the recent proliferation of Unmanned Aerial Systems (UASs) and the consequent
importance of counter-UASs, this project aims to perform the detection and tracking of small non-
cooperative UASs using Electro-optical (EO) and Infrared (IR) sensors. Two data integration tech-
niques, at the decision and pixel levels, are compared with the use of each sensor independently
to evaluate the system robustness in different operational conditions. The data are submitted to a
YOLOv7 detector merged with a ByteTrack tracker. For training and validation, additional efforts are
made towards creating datasets of spatially and temporally aligned EO and IR annotated Unmanned
Aerial Vehicle (UAV) frames and videos. These consist of the acquisition of real data captured from a
workstation on the ground, followed by image calibration, image alignment, the application of bias-
removal techniques, and data augmentation methods to artificially create images. The performance
of the detector across datasets shows an average precision of 88.4%, recall of 85.4%, and [email protected] of
88.5%. Tests conducted on the decision-level fusion architecture demonstrate notable gains in recall
and precision, although at the expense of lower frame rates. Precision, recall, and frame rate are not
improved by the pixel-level fusion design.
Related Work
Several distinct approaches to UAV detection have been introduced over the years, as
explored in [5]. The most frequently used sensors are RADAR, LiDAR, Electro-optical (EO)
and Infrared (IR) imaging cameras, RF based, and acoustic. Each of these sensors has been
widely utilized for the detection of objects for different applications, not necessarily for
UAVs due to their small size constraint. The main characteristics of each sensor, focused on
the detection of small objects, are described as follows:
• RADAR: RADAR transmits electromagnetic waves that are reflected by targets, with
a frequency range from 3 MHz to 300 GHz. The interpretation of the reflected rays
may determine the position and velocity of the objects. A significant issue with
the use of RADAR for the detection of UAVs is raised by their low RADAR cross-
section area, that might make them undetectable. However, because RADAR presents
a high robustness to weather and lighting conditions, research on micro-Doppler
signature-based methods has been conducted [6]. In [7], the micro-Doppler effect for
frequency-modulated continuous-wave (FMCW) RADAR applications is modelled,
showing high confidence rates for UAV class identification based on the number of
UAV motors. In [8], an X-band pulse-Doppler RADAR is used to compare the RADAR
signatures of fixed-wing UAVs with only puller blades, multirotor UAVs with only
lifting blades, and VTOL UAVs with both lifting and puller blades, which can help
identify UAV types.
• LiDAR: LiDAR shares the working principle of RADAR, although having a higher
frequency range, from 200 THz to 400 THz. It is also able to issue a 3D map of the
environment. Although it is less robust to weather, it is still a valuable tool to initialize
the position of the object in a detection system. In [9], a probabilistic analysis of the
detection of small UAVs in various scenarios is proposed.
• RF sensor: These passive sensors capture the signals used by a target to communicate
with the ground, making it possible to detect, locate and also, in some cases, identify
the aircraft. Apart from being robust to weather and lighting, one important feature of
RF sensors is the possibility of detecting the controller on the ground, which is relevant
in countering a threat. In [10], spectral–temporal localization and classification of
the RF signals of UAVs with a visual object detector approach are performed. This
shows promising results, even in noise interference situations, by processing the
spectrograms using the YOLOv5 object detector. In [11], a novel RF signal image
representation scheme that incorporates a convolutional neural network (CNN) is
implemented to perform UAV classification, achieving high classification accuracy
scores of 98.72% and 98.67% on two different datasets.
• Acoustic sensor: An acoustic sensor can detect, distinguish, and identify the sound
emitted by the engine and propellers of a UAV. By using a specific arrangement of
multiple microphones, the estimation of the azimuth and elevation of one or more
UAVs is possible. However, this sensor presents some limitations in terms of the
detection range and accuracy, and susceptibility to background noise interference,
even though it is a low-cost and accessible tool. In [12], good performance results of
the detection and localization of small UAVs are achieved by using an acoustic-based
surveillance system. A UAV detection system based on acoustic signatures that are fed
to two machine learning models, a Random Forest and Multilayer Perceptron (MLP),
is proposed [13]. The MLP model was considered a better solution for the detection of
complex and nonlinear acoustic features.
• EO camera: An EO camera allows for the detection of objects by capturing the light
reflected by them. Although being intuitive for interpretation, and capable of pro-
viding detailed information on the surrounding environment, these sensors present
low robustness to low lighting scenes, namely at night, and weather conditions, such
as rain and fog. For visual object detection, different computer vision algorithms
based on Deep Learning (DL) models have been developed. A comparison between
14 object detectors with the proposed visual UAV dataset is conducted in [14]. This
Drones 2024, 8, 650 3 of 25
makes conclusions based on the performance and processing time. In [15], a detection
and tracking system for small UAVs using a DL framework that performs image
alignment is proposed. Results with high-resolution images show a track probability
of more than 95% up to 700 m. The problem of distinguishing small UAVs from birds
is addressed in [16]. It concludes that object detectors benefit from being trained with
datasets that include various UAVs and birds in order to decrease the number of False
Positive (FP) detections on inference.
• IR sensor: IR sensors, that capture the thermal signature of objects, are much explored
in the military sector, and are more adequate for some challenging scenarios. Partic-
ularly using Long-Wave Infrared (LWIR) sensors, the thermal signatures emitted by
the batteries of UAVs can be detected. These sensors typically have a lower imaging
resolution and higher granularity, which limit their use independently. These issues
that result in a lack of texture and feature highlight are addressed in [17]. An im-
proved detector that drops low-resolution layers and enhances high-resolution layers
is proposed. It also includes a multi-frame filtering stage that consists of an adaptive
pipeline filter (APF) to reduce the FP rate, achieving a precision of more than 95%.
Promising results in small-UAV detection using an IR sensor are achieved in [18],
by learning the nonlinear mapping from an input image to the residual image and
highlighting the target UAV by subtracting these images.
In particular for vision-based object detection, which is one of the most relevant
tasks of computer vision, DL methods are commonly exploited [19]. The object detectors
take an image as input and output the bounding boxes of the detected objects with the
corresponding labels. In turn, multi-object trackers provide continuity in time by connecting
sets of detections across time, by assigning an ID to each one, without any previous
knowledge of their location.
To either tackle different problems or to enhance performance and results, the usage of
more than one sensor is advantageous. For imaging sensors, decision-level and pixel-level
data fusion approaches have shown promising results for various applications. In [20], a
decision-level approach that bases the detection on the sensor with the highest confidence
score at the output stage is developed. In [21], context-aware fusion at pixel level is
performed after image segmentation for traffic surveillance applications. In this case, a
merged image from the output of both cameras is created to be detected. One of its most
relevant challenges is the requirement for spatially and temporally aligned data.
Publicly available datasets of spatially and temporally aligned EO and IR images
for the detection of UAVs are scarce. In [22], three methods to obtain paired EO and
IR images for the detection of cars and people are studied. These include a Generative
Adversarial Network (GAN) algorithm to generate images, a simulation environment,
and a combination of both. The results show a poor performance of the detector when
trained with the synthetic datasets and tested with real data. Even so, traditional data
augmentation techniques such as image manipulation, image erasing, and image mixing
can benefit a dataset and improve the detection results.
Small-object vision-based detection remains a challenging topic, despite extensive
progress over the years to improve both the processing time and accuracy of these systems.
Comparative research into sensors and sensor integration techniques is important.
The present work aims to perform the detection and tracking of small non-cooperative
UAVs using EO and IR imaging sensors. It makes a comparison between the use of each
sensor independently and two sensor fusion algorithms, at the decision and pixel levels.
Since the main objective is to use real data to evaluate the performance of the architectures
in different conditions and scenarios, a further goal is the construction of the necessary
spatially and temporally aligned EO and IR datasets of UAVs. This includes the creation of
artificial data and the acquisition of real data during flight experiments (FEs) performed at
the University of Victoria’s Center for Aerospace Research (UVIC-CfAR). By conducting
extensive robustness tests and validating the system using real flight data, a conclusion
Drones 2024, 8, 650 4 of 25
on the most appropriate method in terms of performance and real-time capabilities for the
cases explored is made.
For the pixel-level data fusion, the raw pixels from multiple sources are combined, so
the fusion occurs on a pixel basis. This is an early-fusion method because it occurs before
image classification. In this case, in the first step, both images are merged into a single one
that preserves the relevant features of each, and then the resulting image is submitted to a
single detector and tracker. The steps of this method are depicted in Figure 2.
In this project, the object detector selected was YOLOv7 that uses a deep CNN in
order to identify objects in images [23]. It belongs to the current state of the art for the
real-time detection You Only Look Once (YOLO) family of object detectors. This choice
was based on the performance improvements both in terms of accuracy but especially
processing time that the authors of YOLOv7 managed to achieve. Even though more recent
versions are available at present, not enough literature had been produced for those at the
time of this selection, so the YOLOv7, which had been extensively reported, was selected.
Despite the development and progress of object detectors in recent years, there are still
relevant challenges that were considered in this work. These include intra-class variation,
where detectors may fail to detect objects of the same class not integrated in the dataset,
and inter-class variation, where object detectors may fail to distinguish different classes.
Hardware requirements are also a relevant limitation since computer vision algorithms
often have a high demand for memory and lead to intensive training sessions.
As for the tracker, the state-of-the-art ByteTrack was selected [24]. ByteTrack is a
tracking-by-detection tracker that uses Intersection over Union (IoU) to associate detections
provided by the object detector with the tracks stored in memory. Based on the detection
results, the tracker creates an ID of the object and follows its trajectory in consecutive
frames, thus giving it the same ID, or a new instance, if it is a new different detected object.
It also uses a Kalman filter to make predictions on the position of the objects in the current
frames, given their location in the previous frames. As opposed to most trackers, ByteTrack
keeps all the detections provided by the associated detector to increase robustness to
cases of occlusion, motion blur, and variable bounding box size increases. In this project,
Drones 2024, 8, 650 5 of 25
the tracking task performed refers to following the trajectory of a UAV in videos, not to
following its trajectory during its flight by having the sensors autonomously move.
Image alignment in consecutive frames was additionally considered for the system
to compensate for camera motion, before submission to the tracker. Here, Enhanced
Correlation Coefficient (ECC) maximization was chosen to estimate the parameters of the
motion models for the system [25]. This gradient-based iterative method is robust against
geometric and photometric distortions. Previous work developed at UVIC-CfAR showed
tracking improvements when this algorithm was applied [26].
3. Dataset
Two datasets were created due to the requirements imposed by the data fusion method-
ologies selected, using data collected during flight experiments: the labelled dataset and
the inference dataset. The labelled dataset was used to train object detectors, and included
spatially and temporally aligned variations of EO, IR, and Pixel Fused real and artificial
images. Each of these sub-datasets consisted of a total of 5977 labelled images of UAVs.
In turn, the inference dataset consisted of EO, IR, and Pixel Fused real videos that were
spatially and temporally aligned but not labelled, that totalled 35,907 frames.
The EO camera sensor was a SONY FCB-EX1020 PAL and the IR camera sensor was a
FLIR TAU 640 PAL. These sensors were integrated in a TASE 200 gimbal, so there was a fixed
displacement between them. This displacement was measured to be 50 mm ± 0.5 mm. The
camera parameters were controlled using ViewPoint software. Each sensor was integrated
with a low-latency video encoder, Antrica ANT-1772, that can stream in both RTSP and
MPEG TS formats over an Ethernet connection. The software Neptune Guard was used to
configure each encoder. The streams were displayed and recorded with Neptune Player
that has a very-low-latency viewer. The OBS program was also used simultaneously for
recording and live streaming the same videos.
Drones 2024, 8, 650 7 of 25
The equipment was on a workstation on the ground capturing the aircraft in the air
and not mounted onboard an aircraft. The operator manually moved the gimbal to include
the aircraft in its field of view, and no autonomous gimbal movement was used. This means
that the cameras were fixed with respect to each other, but the gimbal was moving to record
the aircraft. Both sensors were always set to start recording at the same time to contribute
to the temporal alignment of the frames.
In case this system is mounted onboard an aircraft in future work, the main equipment
change that needs to be made is replacing the computing unit with an embedded system.
The weight of the whole system to be deployed must also be taken into consideration. The
selection of the detector aircraft will depend on this payload weight, which would include
an embedded system, the TASE 200 gimbal, the two video encoders, batteries to power
these components, digital datalinks for communication with the ground station, and the
necessary cables.
First, sensor calibration was performed for the two sensors independently to determine
their geometric parameters since image alignment is essential to guarantee the success
of the FusionGAN algorithm. The MATLAB simple camera calibration app was used for
the calibration. Based on the Pinhole Perspective camera model, the intrinsic, distortion,
and extrinsic parameters of each sensors were estimated [29]. The calibration results are
presented in Table 1, where the most significant difference is observed in the distortion
coefficients, namely for radial distortion. In fact, the distortion that the IR sensor causes in
the images is noticeable to the naked eye in some cases. However, this is found to have a
negligible impact on the alignment of the UAVs at longer ranges.
One aspect to consider was that the IR sensor could not capture a defined image
at a close range. For this reason, a calibration board printed in the standard A4 or A3
sizes would be too small to be captured, appearing blurry through the IR sensor, which
would make the calibration process impossible. Instead, a 10 × 7 calibration board with
15 cm × 15 cm squares, totalling 150 cm × 105 cm, was built to capture the images, as can
be seen in Figure 5.
(a) (b)
Figure 5. Cont.
Drones 2024, 8, 650 8 of 25
(c) (d)
Figure 5. Calibration procedure using the calibration board created: (a) EO image at close range.
(b) EO image at far range. (c) IR image at close range. (d) IR image at far range.
The goal of the most relevant flight experiments was to collect data containing as
much variety in operational conditions as possible, in the form of videos, in mp4, recorded
at 25 fps, to integrate the dataset to train the object detector models and test the system.
Flight Experiment A was an operation at UVIC-CfAR with the aircraft Mini-E [30],
illustrated in Figure 6a. This aircraft was flying in circles passing through the waypoints
shown in Figure 7a. Additional flights of a DJI Mavic 2, displayed in Figure 6b, were
performed in the same day to guarantee more variety on a more common aircraft. This
UAV was flying following straight lines as can be seen in Figure 7b. The total recorded
flight time was 51 min 21s for the Mini-E and 22 min 28 s for the DJI Mavic 2. During the
postprocessing stage of the raw videos, it was concluded that the EO and IR frames were
not always totally aligned. This happened mostly when one of the sensors of the gimbal
automatically adjusted a camera parameter, creating a lag on the transmission of the videos.
Figure 6. UAVs captured during flight experiments: (a) FE A—Mini-E. (b) FE A—DJI Mavic 2. (c) FE
B—MIMIQ. (d) FE C—DJI Inspire 1. (e) FE D—Zeta FX-61 Phantom Wing. (f) FE D—DJI Mini 3 Pro.
Drones 2024, 8, 650 9 of 25
Flight Experiment B was also an operation at UVIC-CfAR with the main goal of
gathering data on the hybrid multirotor [31], shown in Figure 6c. The flights captured
include vertical take-off, hovering, and landing. Since this flight experiment was performed
inside a gymnasium and hence the background is similar in all frames, only a total of 2 min
10 s was recorded.
Flight Experiment C was conducted at UVIC-CfAR with the main goal of gathering
footage to include in the dataset to test the system on inference. This includes both the same
DJI Mavic 2 of Flight Experiment A, shown in Figure 6b, to assess the different performance
results of the system with the same aircraft but different conditions, and also a DJI Inspire
1, that can be seen in Figure 6d, to test the robustness of the system to intra-class variation.
The flight paths chosen for this experiment were the same for both UAVs, that followed
straight lines as depicted in Figure 7c. The DJI Mavic 2 and DJI Inspire 1 were recorded
during a total of 19 min 18 s and 24 min 35 s, respectively.
Additional data were collected on Flight Experiment D conducted at Instituto Superior
Técnico (IST) with a TeAx ThermalCapture Fusion Zoom. The frames provided include a
Zeta FX-61 Phantom Wing and a DJI Mini 3 Pro, as can be seen in Figure 6e,f, respectively.
In general, the data include frames with the UAV blurred or partially cut, the presence
of birds in some frames, frames above and below the local horizon, and a background with
variety in objects, especially trees, houses, and farming tools. There is also variety in the
range of the UAV and its position in the frames. In terms of lighting, variety includes bright
images taken during summer days, indoor images with artificial lighting, and images taken
at twilight in autumn.
the images. Computer vision algorithms may be significantly impacted by this kind of
effect. First, this was considered a vignette effect, that is a brightness attenuation away
from the image center, and treated as such, using a method to estimate it from a single
image [32]. This approach did not yield satisfactory vignette function estimations for all
the images. For this reason, it was considered an intensity nonuniformity, that is a bias
that can be caused by illumination changes, thus taking the perturbation as a variation in
intensity that does not take a specific distribution [33]. By using this strategy, the results for
gradient estimation and hence bias removal from the images was acceptable. One result
example of this procedure is shown in Figure 8. The usage of such an approach can generate
more noise, so, for research purposes, the IR dataset was duplicated and the bias-removal
algorithm applied to the copy. A Pixel Fused dataset with the FusionGAN algorithm using
as inputs the EO and IR with bias removed was also created. The goal was to evaluate the
effect of the image correction by comparing the performance of the object detectors with
the original images and bias-corrected images.
For the Flight Experiment D data, the software used for data capture performed
image alignment, so the frame selection was solely supervised to guarantee the elimination
of outliers.
For the real data captured, the UAV had to be labelled by outlining its bounding box,
assigning it a class, and producing a .txt file in the YOLO format. The accuracy of the labels
associated with each object has a significant impact on the performance of an object detector.
Two main labelling strategies were considered. First, by sending the data to another object
detector trained for the same purpose, a DL-assisted methodology could be implemented.
The label files could be created using the output detections of this extra detector. However,
this method has an additional source of error, since it is dependent on the precision, recall,
and accuracy of this detector, which might have significant impacts on the dataset. For
this reason, although being more time-efficient, to guarantee a good outcome, this method
should be supervised and the results verified. Secondly, the dataset could be manually
labelled by outlining the UAV, if present, and creating the .txt file. Even though the latter is
especially time-consuming for large datasets, a decision was made to manually label all
the real data that was featured in the labelled dataset (5977 images) to avoid the error a
DL-assisted method can have.
Finally, a data augmentation strategy was used to create artificial images. Figure 9
shows three examples of the final produced pairs of images of UAVs. The method consisted
of firstly placing background-transparent images of UAVs in spatially and temporally
aligned images of backgrounds, and in the end applying random features such as a bright-
ness change to increase variety in the dataset. This approach is especially interesting in this
particular scenario as there are not many restrictions on the position of a UAV within an
image. For instance, this method would not be effective for a dataset of railed vehicles that
have to be placed on rails for an image to be plausible. The algorithm started by making a
random choice of an image pair for the background and of a UAV image. Due to the lack of
Drones 2024, 8, 650 11 of 25
publicly available datasets of paired UAV images, the algorithm took as input only an EO
image of the UAV. The corresponding IR image was created by the algorithm by applying
a transformation to the EO image. This gave it a random greyscale intensity, between a
range of values, and applied a random level of blurriness to its outline. This approach was
selected after performing some tests that reached the conclusion that this was the method
that created IR images of the UAV most similar to the ones from the IR sensor used in the
flight experiments. Then, it randomly chose the size of the UAV in the frame, followed
by the random selection of its position in the image. It also incorporated the options to
rotate the UAV, and change the brightness, contrast, and blurriness level of the produced
image. The background images used were obtained by the TASE 200 both during Flight
Experiment A and during extra experiments at UVIC-CfAR. As for the UAV images, eight
different models that included quadcopters, a hexacopter, and a fixed wing were used. This
process has the advantage of automatically producing the labels in the YOLO format.
Figure 9. Artificial image pair creation algorithm: (a–c) EO images. (d–f) IR images.
To sum up the labelled dataset, there were a total of five variations of spatially and
temporally aligned images: EO, IR, IR with bias removed, Pixel Fused, and Pixel Fused
with bias removed. Figure 10 shows three frame examples of the dataset. The image size is
variable in a few pixels due to the image alignment, but is approximately 640 × 512 for all
images. From the total, some are images with no UAVs, and the remaining ones have a total
of 11 different aircraft. About 20% of the images are artificially created. The comparison of
the architectures is more legitimate by having the datasets identical apart from in image
type or data fusion methodology.
In this project, to have as much data in the training set as possible because the dataset
was relatively small, the 80-10-10 partition for training, validation, and test sets was used.
Drones 2024, 8, 650 12 of 25
Figure 10. Dataset examples: (a–c) EO images. (d–f) IR images. (g–i) IR images with bias removed.
(j–l) Pixel Fused images. (m–o) Pixel Fused images with bias removed.
Drones 2024, 8, 650 13 of 25
It is possible to conclude that all models had a similar performance on the respective
test sets. The lower values for mAP@[.5:.95] mean that the bounding boxes outlined by the
detectors were not always exactly placed, even if the detection of the UAV was correct.
As for the differences between datasets, both the IR and both the Pixel Fused models
always outperformed the EO model, although it was not a significant improvement. One
possible explanation for this is the extra information an EO model has to learn, since it
involves colours, while the other models deal with intensities. Additionally, one present
case in this dataset that benefited the IR and Pixel Fused models was when the UAV was
below the local horizon, thus having a textured background. Although this was not found
in great abundance, this may have led to FNs for the EO detector, especially if the UAV had
similar colours to the scene.
Drones 2024, 8, 650 14 of 25
Finally, it is relevant to compare the IR and Pixel Fused models with the corresponding
models with bias removed. In general, the metrics showed a better performance for
the original models. Firstly, the bias-removal algorithm might have introduced noise in
the images, deteriorating the results. Secondly, the bias estimation and then removal
procedure might have removed pixel intensity from the UAV, reducing its highlight and
thus disturbing the detection process. Finally, it is possible that the bias inherent in the
original images had no negative effect on the results. This could be because the gradients
were not fixed for all the images, that is, the bias mask was not constant, and due to the fact
that the dataset included variety in the position of the UAV in the frame. Since no significant
improvements were observed, further tests consider the models without bias removal.
By examining the output images that are a part of the test set, it was possible to isolate
the operational conditions that contributed to an increase in FPs and FNs leading to a
decrease in precision and recall, respectively. For precision, the models often mistook
birds for UAVs, besides producing some FPs in the presence of background objects such as
houses. For recall, the majority of the FNs occured when the UAV was flying below the
local horizon with a textured background. The IR sensor presented higher robustness to
this scenario. Apart from these impacts, there were also other conditions with particular
interest in the context of this project, such as when the UAV appeared blurry or partially
cut in the image, and intra-class variation. This can be tested with the data from Flight
Experiment C that includes an aircraft not featured in the dataset for training.
Additional training sessions were performed using the YOLOv7-tiny model. This is a
similar approach to the YOLOv7 model but this configuration takes a reduced number of
parameters, thus using less GPU memory, which makes it faster and less resource-intensive.
The tests and results using the YOLOv7-tiny model are relevant in case the system is
implemented onboard an aircraft. In this scenario, the computing system used needs to be
changed to an embedded system, which limits the frame rate since its parallel processing
ability is significantly constrained.
Table 3 shows precision, recall, and mAP results as presented for the YOLOv7 model.
In this case, an additional column with the percentage of decrease in the processing time of
the YOLOv7-tiny model, when compared to the YOLOv7 model, is presented.
Time per
Model Precision Recall [email protected] mAP@[.5:.95] Image
Variation (%)
EO 0.856 0.813 0.823 0.544 −47.6
IR 0.877 0.835 0.873 0.615 −67.1
Pixel Fused 0.878 0.855 0.872 0.550 −55.7
Average 0.870 0.834 0.856 0.570 −56.8
When compared to the results from Table 2, accuracy decreases when using the
YOLOv7-tiny model, even though it is not very significantly. In terms of processing time,
however, the decrease obtained is on average 56.8% across datasets. These results are
relevant for a real-time implementation of the system onboard an aircraft, which requires
the models to run on an embedded system.
Since further testing in this project was performed offline using an NVIDIA GeForce
RTX 4080, which is a relatively fast GPU, it was decided to continue with the YOLOv7
models, thus favouring performance metrics instead of the frame rate.
4.2. Overfitting
Overfitting is a concerning problem in a detection system. In particular, for this project,
overfitting was carefully examined since the detector was trained using an original and
relatively small dataset, and some precautions were also taken to prevent it. Firstly, the
Drones 2024, 8, 650 15 of 25
validation set of data was used to perform cross-validation. This process consisted of the
constant evaluation of the models using the validation set during the training process in
order to assess their capacity to generalize to different data. Secondly, data augmentation
was used mainly by activating the YOLOv7 built-in augmentation option. This includes the
application of methods such as translation, cropping, noise, brightness, contrast, saturation,
and Gaussian blur to the images during the training stage. Finally, there was a strict control
over the overall number of training epochs used for each model, and the training sessions
were stopped when no significant improvements were verified. Since, as shown in Table 2
for a training session similar to all five datasets, the models presented good results, showing
the ability to make accurate predictions for the validation data, and also for new data in
the test set, it was concluded that they were not overfitting. To further prove this for the
EO, IR, and Pixel Fused models, independent tests in inference were conducted using
video segments with more variety in certain conditions. Taking this into account, it was
concluded that the models were not overfitting, despite not always being robust to all
conditions and variables. For the goal of this study, which is a proof of concept for data
fusion techniques, it was decided that the models in their present condition were adequate
and sufficient.
Table 4. Independent model testing on the detector and tracker for Flight Experiments A and C.
It is observable that the results for Flight Experiment A are better in terms of recall, in
general, due to the frequent presence of birds, which can lead to FPs. In turn, as expected,
the results on the video segments from Flight Experiment C are better for precision due to
Drones 2024, 8, 650 16 of 25
the variety in conditions and UAV models that these data have, which were not included in
the dataset and that can lead to FNs. In fact, the predominance of video segments with the
UAV below the local horizon is purposely much higher for Flight Experiment C, since it is
one of the target conditions to be analysed by the data fusion methods. One other factor to
consider is the lighting condition of the environment, which was recorded predominantly
around twilight, where the sky is exceptionally bright. This led the images below the local
horizon to be very dark in contrast with the sky, and so the UAV was often undetectable,
even to the naked eye, as depicted in Figure 11. This affected images provided by both
sensors. When comparing the performance of both sensors separately, it is possible to
conclude that, for Flight Experiment A, both precision and recall, and also frame rate, are
generally better for the EO sensor. For Flight Experiment C, these results are more variable
depending on the range. In terms of range, the performance is better for the medium
and far ranges than for the close range. In particular, for Flight Experiment C, the recall
obtained using the IR sensor for the close range is lower because in most frames the UAV is
closer than the focal distance of the sensor, hence appearing very blurry in the images. For
the very far range in Flight Experiment C, both models underperform. This leads to the
conclusion that this range limits the system.
(a) (b)
Figure 11. UAV recorded at twilight: (a) EO image. (b) IR image.
As for the average number of IDSs per 100 frames, it is generally lower for the EO
sensor than for the IR. Visual analysis of the output videos with IDs led to the conclusion
that there are mainly three reasons for the missed tracks of the tracker. Firstly, there is the
case when the UAV is not detected at all, leading to FNs. Here, if it happens in consecutive
frames and the UAV is lost from the list of tracks, the tracker assigns it a new ID. Secondly,
the tracker incorrectly assigns an ID to an object when it is detected in a series of successive
frames, such as when there are FPs on birds that follow a trajectory. Finally, abrupt camera
movements can negatively impact the tracker performance, even when subtle. This effect
is more significant at longer ranges because the predicted bounding box is smaller, and so
the probability of camera movements leading to a miss in overlap of consecutive bounding
boxes is higher.
As for the frame rate, the system is able to process from 97.6 to 117.0 frames every
second. This is a relative value that highly depends on the hardware used.
the performance deteriorated for images with the sky as background or moving objects,
especially when the UAV was flying at a long range. In both cases, the algorithm increased
the system frame rate.
Given the mentioned reasons, the algorithm was not considered beneficial for the
present study and further tests do not include its application. It is important to emphasize
that implementations of the ECC algorithm have shown acceptable results and improve-
ments in the tracker performance, namely in the project developed in [26]. Although it was
discarded for the present work, it is still regarded as valuable tool for image alignment,
and its implementation worthwhile for different contexts. Therefore, the ECC should be
re-tested for an online implementation of the system onboard an aircraft, which is more
susceptible to sudden camera movements that cannot be filtered out.
(a) (b)
(c) (d)
Figure 12. Independent model detection and tracking on higher robustness target cases: (a) EO
blurry UAV image. (b) IR blurry UAV image. (c) EO partially cut UAV image. (d) IR partially cut
UAV image.
As for intra-class variation, some conclusions were drawn from the analysis of the
output video segments of the particular case of the aircraft that was not included in the
training data, for independent models. In general, precision remains similar, which means
the number of FPs was not highly impacted, as expected. However, for recall, the detector
fails more often in predicting a UAV that was not included in the labelled dataset. However,
in most cases, it makes a successful detection, but with a lower confidence score.
One of the most frequent problems that object detectors face is the existence of similar
objects in the images that do not belong to the class being detected. In the case of UAV
detection, birds are the main concern. The analysis of this case on the independent models
Drones 2024, 8, 650 18 of 25
showed that there was a decrease in average precision for both models that did not depend
on UAV range, even though the value for recall remained similar to the average.
Finally, there was a decrease in average recall for the textured background case, and the
performance of the detector was much better when the UAV was above the local horizon,
having the sky as the immediate background. Even so, the case where the IR model made a
detection but the EO model failed was more common because the IR signature of the UAV
was highlighted against the background. Conversely, the EO images had more detail at a
closer range and so the detection was more likely.
Thus, the particularly relevant cases to analyse with the implementation of the data
fusion methodologies are the intra-class variation, presence of birds, and textured back-
ground scenarios, due to the lower robustness the independent models presented. One
example for each of these scenarios is depicted in Figure 13. Both in the intra-class and
textured background target cases, the figure shows an example with a successful detection
in the EO sensor and a FN in the IR sensor, which the data fusion methodologies aim to
eliminate. For the presence of birds case, both sensors have a FP detection on a bird besides
the UAV.
(a) (b)
(c) (d)
(e) (f)
Figure 13. Independent model detection and tracking on lower robustness target cases: (a) EO
intra-class variation image. (b) IR intra-class variation image. (c) EO presence of birds image. (d) IR
presence of birds image. (e) EO textured background image. (f) IR textured background image.
Drones 2024, 8, 650 19 of 25
Table 5. Data fusion testing on the detector and tracker for Flight Experiments A and C.
Precision Recall
Frame Rate IDS per
FE Data Range Precision Variation Recall Variation
(fps) 100 Frames
(%) (%)
close 0.999 +4.3 0.979 −0.4 91.1 0.000
A EO-IR medium 0.999 +3.7 0.988 −0.7 93.1 0.494
far 0.996 +5.9 0.951 +7.3 82.2 3.885
close 0.992 +5.2 0.979 +0.3 89.8 0.395
A IR-EO medium 0.997 +3.7 0.989 +2.6 92.4 0.681
far 0.992 +8.7 0.952 −0.3 84.1 3.007
close 0.999 +4.9 0.634 −0.4 81.5 0.679
medium 0.999 +1.3 0.808 +3.8 87.2 0.427
C EO-IR
far 0.995 +0.9 0.719 +8.4 78.9 1.662
v. far 0.994 +6.1 0.182 +7.1 76.1 1.180
close 0.995 +3.1 0.572 +11.7 77.1 0.679
medium 0.992 +4.9 0.822 +4.9 87.0 0.532
C IR-EO
far 0.989 +7.0 0.752 +3.5 81.7 1.431
v. far 0.975 +4.7 0.241 −0.4 75.8 0.600
close 0.943 −0.70|−2.10 0.429 −21.0|−2.70 113.4 4.253
medium 0.940 −4.50|+0.10 0.577 −16.5|−16.9 107.4 5.219
C Pixel Fused
far 0.954 −3.30|+1.70 0.413 −15.8|−15.5 113.7 2.190
v. far 0.984 +5.20|+5.60 0.123 +1.30|−12.1 117.1 0.397
For the decision-level architecture, precision and recall increase on average 3.9% and
3.6%, respectively, for the EO-IR configuration, that is, when the EO results are occasionally
complemented by the IR results, when compared to the EO independent model. As
for the IR-EO configuration, there is an average increase in precision of 5.3% and recall
of 3.2% when compared to the IR independent model. The average lower results for
precision and recall for Flight Experiment C are due to the fact that the independent models
also have lower results for this data, and it does not directly mean that the algorithm is
underperforming. In fact, in terms of percentages, the improvements that the algorithm
manages to accomplish are similar between flight experiments. In some cases, even though
there is a significant increase in precision, it comes at the cost of a reduction in recall. This
compromise might or might not be worthwhile depending on the system requirements.
One limitation of this architecture is that it is always conditioned by the performance of the
sensors independently. This influences mainly recall because if, for instance, both cameras
happen to have a FN, the system with the decision-level data fusion will preserve the FN
and keep recall unchanged.
For the pixel-level approach, the results for precision and recall underperform when
compared to both decision-level architectures for all ranges and, in most cases, do not
show improvements when compared to the use of each sensor independently. The main
factors influencing the results are the lighting conditions that were experienced during
Drones 2024, 8, 650 20 of 25
Flight Experiment C, that led the fusion to produce images that appear distinct from the
ones featured in the dataset, and the imperfect video alignment. In fact, it is possible to
conclude from the analysis of the output videos that the performance of the model is worse
in the frames of the video segments when the alignment starts to fail, mostly in terms of
recall, such as the one illustrated in Figure 14a. Evidently, the longer the range of the UAV,
the more significant the impact of a failure in alignment is. As can be seen in the example
in Figure 14b, in some of the video segments, a failure in alignment can cause a complete
miss in overlap of the UAV from the two sensors. Here, the models frequently produced
FPs by detecting two UAVs, as opposed to FN detections.
(a) (b)
Figure 14. Alignment failure on Pixel Fused images: (a) Vertical shift of input images to Fusion-
GAN. (b) Significant vertical shift of input images leading to complete UAV overlap miss on Pixel
Fused images.
As for the number of IDSs per 100 frames, for the decision-level architectures, on
average, this value is reduced in all cases presented, namely due to the reduction of FPs.
For the pixel-level architecture, no pattern is verified for the number of IDSs per 100 frames
since precision and recall also either suffer an increase or decrease.
For frame rate, since the decision-level algorithm only resorts to the confirmation
model if necessary, the average only drops from 6.5 to 34.3 fps for the close, medium, and
far ranges, despite presenting a high variance. For the pixel-level fused models, when
compared to the independent models, there is a lower processing time. However, this is
not taking into account the fusion time.
One of the independent models fails at detection of the UAV in the intra-class variation
scenario, but the decision-level data fusion algorithm manages to make the detection in
both cases and recall is improved. In this example, the pixel-level data fusion misses the
detection. This specific case was found to be common in the tests.
In the presence of birds, the decision-level data fusion algorithm successfully manages
to improve precision. On average, this metric was improved by about 8.2% and 11.3% for
the EO-IR and IR-EO configurations, respectively. Since this scenario is only significant for
Flight Experiment A, the pixel-level architecture could not be tested for this case. Figure 16
shows the application of the decision-level data fusion architectures on the same example
as in Figure 13c,d, where both the independent models identify a bird as a UAV. As can be
seen in the images, these FPs are eliminated by the decision-level data fusion algorithm,
both having the EO or IR sensor as the head sensor.
(a) (b)
Figure 16. Data fusion detection and tracking with the presence of birds target case: (a) EO-IR
architecture. (b) IR-EO architecture.
Finally, for the textured background scenario, there are predominantly two reasons
affecting the results. First, when the range increases and the UAV size is limited to fewer
pixels, if the background is textured, the UAV becomes more easily mistaken by it. This
was also experienced by the naked eye during the flight experiments. Secondly, the lighting
condition of the scenario causes a decrease in recall, especially for the twilight videos
from Flight Experiment C. Besides these factors, the decision-level architecture manages
to decrease the number of FNs and hence improve recall, and increase precision, often
significantly. The recall is improved, on average, by about 5.2% and 6.9% for the EO-IR and
IR-EO configurations, when compared to the EO and IR independent models, respectively.
However, the performance results of the detector using the IR model are not as superior as
expected. In fact, one of the main reasons to choose the use of the IR sensor was its ability
to highlight the UAV in scenarios where it is easily mistaken for background through an EO
camera, and imperceptible to the naked eye. Furthermore, during the flight experiments,
Drones 2024, 8, 650 22 of 25
when the UAV was at a long range and flying below the local horizon, it was only possible
for the operator of the sensors to visually detect it through the IR sensor. Given this, the
lower recall can be due to mainly two factors. Firstly, the IR images are more granular
and not as sharp, when compared to the EO ones, which means that, at a long range,
regardless of the background, the detection task becomes more challenging. Secondly, the
results can also be conditioned by the quality of the dataset itself. Even though it is capable
of generalizing, since it did not include as many scenarios with textured background as
desired, recall in this scenario suffered a decrease. For the pixel-level architecture, no
consistent improvement is observed, even though some cases show significant increases
in precision and recall. Figure 17 shows the same frames as Figure 13e,f, but with the
implementation of the data fusion methodologies. In the decision-level data fusion cases,
the UAV is detected, even though one of the independent models fails to detect it. However,
the pixel level often fails to detect the aircraft with a textured background, which is in
accordance with the low values for recall shown in Table 5.
5. Conclusions
In this project, a detection and tracking system for small UAVs using an EO sensor and
an IR sensor was developed, and a comparison of the use of these sensors independently
with two data fusion methodologies was performed. To this end, flight experiments were
conducted for data collection. As a result, additional contributions of this project are
datasets of spatially and temporally aligned EO and IR data, one with labelled images
of UAVs, and one with UAV videos, not labelled. Finally, the system was evaluated for
different operational scenarios and target conditions, and tested using the flight test data
experimentally collected.
First, YOLOv7 tests were performed for five variations of the labelled dataset: EO
images, IR images, IR images with image bias removed, pixel-level fused images, and
pixel-level fused images with image bias removed. Similar results for the performance
metrics were obtained, achieving an average precision of 0.884, average recall of 0.854,
average [email protected] of 0.885, and average mAP@[.5:.95] of 0.627. The dataset variations that
had the bias removed were discarded since no significant improvements were observed.
Next, the detection and tracking tests were conducted, with the addition of the Byte-
Track tracker to the system, on the inference dataset, using the independent EO and IR
models to benchmark the performance of the sensors. Average results were presented for
different ranges, and target conditions were identified. Both sensors exhibited acceptable
performance in the blurry UAV and partially cut UAV scenarios, but precision suffered a
decrease in the presence of birds, and recall suffered a decrease in the intra-class variation
and textured background scenarios. Both decision-level and pixel-level data fusion method-
ologies were tested for the same video segments and target conditions. For the presence
of birds case, the decision-level architecture showed significant improvements in preci-
sion, and thus tracker performance, at the cost of a decrease in frame rate. For intra-class
Drones 2024, 8, 650 23 of 25
variation, the decision-level architecture showed improvements for precision, recall, and
tracker performance, but the pixel-level architecture under-performed, in general, when
compared to both independent models and the decision-level architecture. For the textured
background case, both precision and recall presented significant improvements with the
application of the decision-level architecture, as opposed to the pixel-level architecture
which was not considered beneficial.
To sum up, in general, the decision-level data fusion architecture showed the best
performance, and its use proved to be promising. Furthermore, there is potential for opti-
mization and enhancement in the implementation of this algorithm. Nevertheless, there is
a compromise between increasing precision and recall, and improving tracker performance,
with the decrease in frame rate. For this reason, the selection of the architecture depends
on each C-UAS system and its goals and requirements, and must be thoroughly considered.
The use of each sensor independently may also be beneficial for some scenarios. As for the
pixel-level architecture, even though in this study it showed poor results and, in general, it
is not considered advantageous, better equipment for a more accurate image alignment,
and the use of other fusion algorithms may lead to improvement in this methodology.
The conclusions drawn on this proof of concept research, by comparing architectures
for the EO and IR sensors, and data fusion methodologies, are a contribution to detection
and tracking tasks, and a basis for future work on C-UASs.
Author Contributions: Conceptualization, A.P., S.W., A.M. and A.S.; methodology, A.P.; investigation,
A.P.; resources, A.P. and S.W.; writing—original draft preparation, A.P.; writing—review and editing,
A.P., S.W., A.M. and A.S.; supervision, A.M. and A.S.; project administration, A.S.; funding acquisition,
A.S. All authors have read and agreed to the published version of the manuscript.
Funding: This work has been partially funded by Fundação para a Ciência e a Tecnologia (FCT)
under project LAETA Base Funding (https://doi.org/10.54499/UIDB/50022/2020). A.S. is grateful
for the NSERC Discovery and Canada Research Chair Programs.
Data Availability Statement: The original data created in the study are openly available in Mendeley
Data at https://doi.org/10.17632/sn9vy5c8sm.1.
Conflicts of Interest: The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
C-UAS Counter-UAS
CNN Convolutional Neural Network
DL Deep Learning
EO Electro-optical
FE Flight Experiment
FN False Negative
FP False Positive
GAN Generative Adversarial Network
IDS Identification Switch
IR Infrared
[email protected] Mean Average Precision at 0.5
mAP@[.5:.95] Mean Average Precision at [.5:.95]
MLP Multilayer Perceptron
RF Radio Frequency
UAS Unmanned Aerial System
UAV Unmanned Aerial Vehicle
UVIC-CfAR University of Victoria’s Center for Aerospace Research
Drones 2024, 8, 650 24 of 25
References
1. Worldwide Drone Incidents. Available online: https://www.dedrone.com/resources/incidents-new/all (accessed on 19 Jan-
uary 2024)
2. Castrillo, V.U.; Manco, A.; Pascarella, D.; Gigante, G. A Review of Counter-UAS Technologies for Cooperative Defensive Teams of
Drones. Drones 2022, 6, 65. [CrossRef]
3. Park, S.; Kim, H.T.; Lee, S.; Joo, H.; Kim, H. Survey on Anti-Drone Systems: Components, Designs, and Challenges. IEEE Access
2021, 9, 42635–42659. [CrossRef]
4. Wang, J.; Liu, Y.; Song, H. Counter-Unmanned Aircraft System(s) (C-UAS): State of the Art, Challenges, and Future Trends. IEEE
Aerosp. Electron. Syst. Mag. 2021, 36, 4–29. [CrossRef]
5. Wang, B.; Li, Q.; Mao, Q.; Wang, J.; Chen, C.L.P.; Shangguan, A.; Zhang, H. A Survey on Vision-Based Anti Unmanned Aerial
Vehicles Methods. Drones 2024, 8, 518. [CrossRef]
6. Sun, Y.; Abeywickrama, S.; Jayasinghe, L.; Yuen, C.; Chen, J.; Zhang, M. Micro-Doppler Signature-Based Detection, Classification,
and Localization of Small UAV with Long Short-Term Memory Neural Network. IEEE Trans. Geosci. Remote Sens. 2021, 59,
6285–6300. [CrossRef]
7. Passafiume, M.; Rojhani, N.; Collodi, G.; Cidronali, A. Modeling small UAV micro-doppler signature using millimeter-wave
FMCW radar. Electronics 2021, 10, 747. [CrossRef]
8. Yan, J.; Hu, H.; Gong, J.; Kong, D.; Li, D. Exploring Radar Micro-Doppler Signatures for Recognition of Drone Types. Drones 2021,
7, 280. [CrossRef]
9. Dogru, S.; Marques, L. Drone Detection Using Sparse Lidar Measurements. IEEE Robot. Autom. Lett. 2022, 7, 3062–3069. [CrossRef]
10. Nelega, R.; Belean, B.; Valeriu, R.; Turcu, F.; Puschita, E. Radio Frequency-Based Drone Detection and Classification using Deep
Learning Algorithms. In Proceedings of the 2023 International Conference on Software, Telecommunications and Computer
Networks (SoftCOM), Split, Croatia, 21–23 September 2023.
11. Fu, Y.; He, Z. Radio Frequency Signal-Based Drone Classification with Frequency Domain Gramian Angular Field and Convolu-
tional Neural Network. Drones 2024, 8, 511. [CrossRef]
12. Shi, Z.; Chang, X.; Yang, C.; Wu, Z.; Wu, J. An Acoustic-Based Surveillance System for Amateur Drones Detection and Localization.
IEEE Trans. Veh. Technol. 2020, 69, 2731–2739. [CrossRef]
13. Ahmed, C.A.; Batool, F.; Haider, W.; Asad, M.; Raza Hamdani, S.H. Acoustic Based Drone Detection Via Machine Learning. In
Proceedings of the 2022 International Conference on IT and Industrial Technologies (ICIT), Shanghai, China, 28–31 March 2022.
14. Zhao, J.; Zhang, J.; Li, D.; Wang, D. Vision-Based Anti-UAV Detection and Tracking. IEEE Trans. Intell. Transp. Syst. 2022, 23,
25323–25334. [CrossRef]
15. Ghosh, S.; Patrikar, J.; Moon, B.; Hamidi, M.M.; Scherer, S. AirTrack: Onboard Deep Learning Framework for Long-Range Aircraft
Detection and Tracking. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London,
UK, 29 May–2 June 2023.
16. Coluccia, A.; Fascista, A.; Schumann, A.; Sommer, L.; Dimou, A.; Zarpalas, D.; Méndez, M.; de la Iglesia, D.; González, I.;
Mercier, J.P.; et al. Drone vs. Bird detection: Deep learning algorithms and results from a grand challenge. Sensors 2021, 21, 2824.
[CrossRef] [PubMed]
17. Ding, L.; Xu, X.; Cao, Y.; Zhai, G.; Yang, F.; Qian, L. Detection and tracking of infrared small target by jointly using SSD and
pipeline filter. Digit. Signal Process. Rev. J. 2021, 110, 102949. [CrossRef]
18. Fang, H.; Ding, L.; Wang, L.; Chang, Y.; Yan, L.; Han, J. Infrared Small UAV Target Detection Based on Depthwise Separable Resid-
ual Dense Network and Multiscale Feature Fusion. IEEE Trans. Instrum. Meas. 2022, 71, 1–20.
tim.2022.3198490. [CrossRef]
19. Wu, X.; Li, W.; Hong, D.; Tao, R.; Du, Q. Deep Learning for Unmanned Aerial Vehicle-Based Object Detection and Tracking: A
survey. IEEE Geosci. Remote Sens. Mag. 2022, 10, 91–124. [CrossRef]
20. Svanström, F.; Alonso-Fernandez, F.; Englund, C. Drone Detection and Tracking in Real-Time by Fusion of Different Sensing
Modalities. Drones 2022, 6, 317. [CrossRef]
21. Alldieck, T.; Bahnsen, C.H.; Moeslund, T.B. Context-aware fusion of RGB and thermal imagery for traffic monitoring. Sensors
2016, 16, 1947. [CrossRef] [PubMed]
22. Yang, L.; Ma, R.; Zakhor, A. Drone Object Detection Using RGB/IR Fusion. In Proceedings of the Symposium on Electronic
Imaging: Computational Imaging XX, Online, 17–20 January 2022.
23. Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object
detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC,
Canada, 18–22 June 2023.
24. Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. YOLOv7: ByteTrack: Multi-Object Tracking by
Associating Every Detection Box. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27
October 2022.
25. Evangelidis, G.D.; Psarakis, E.Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans.
Pattern Anal. Mach. Intell. 2008, 30, 1858–1865. [CrossRef] [PubMed]
26. Lopes, J.P.D.; Suleman, A.; Figueiredo, M.A.T. Detection and Tracking of Non-Cooperative UAVs: A Deep Learning Moving-Object
Tracking Approach. MsC Thesis, Instituto Superior Técnico, Lisbon, Portugal, 2022.
Drones 2024, 8, 650 25 of 25
27. Sun, C.; Zhang, C.; Xiong, N. Infrared and visible image fusion techniques based on deep learning: A review. Electronics 2020, 9,
2162. [CrossRef]
28. Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf.
Fusion 2019, 48, 11–26. [CrossRef]
29. Szeliski, R. Computer Vision: Algorithms and Applications, 2nd ed.; Springer: New York, NY, USA, 2021; pp. 33–96.
30. Pedro, S.; Tomás, D.; Vale, J.L.; Suleman, A. Design and performance quantification of VTOL systems for a canard aircraft.
Aeronaut. J. 2021, 125, 1768–1791. [CrossRef]
31. Castellani, N.; Pedrosa, F.; Matlock, J.; Mazur, A.; Lowczycki, K.; Widera, P.; Zawadzki, K.; Lipka, K.; Suleman, A. Development
of a Series Hybrid Multirotor. In Proceedings of the 13th EASN International Conference on Innovation in Aviation & Space for
opening New Horizons, Salerno, Italy, 5–8 September 2023.
32. Zheng, Y.; Lin, S.; Kambhamettu, C.; Yu, J.; Kang, S.B. Single-Image Vignetting Correction. IEEE Trans. Pattern Anal. Mach. Intell.
2009, 31, 2243–2256. [CrossRef] [PubMed]
33. Zheng, Y.; Grossman, M.; Awate, S.; Gee, J. Automatic Correction of Intensity Nonuniformity From Sparseness of Gradient
Distribution in Medical Images. In Proceedings of the 12th International Conference on Medical Image Computing and Computer
Assisted Intervention, London, UK, 20–24 September 2009.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.