0% found this document useful (0 votes)
6 views35 pages

Report (1) .New

The document discusses the challenges and advancements in small object detection, emphasizing its importance in fields like aviation safety, autonomous driving, and satellite imaging. It highlights the limitations of existing models like YOLO and SSD in detecting small objects and introduces a Collaborative Filtering Mechanism (CFM) to enhance detection performance by integrating motion trend information. The document also reviews recent developments in object detection technologies, including the use of deep learning and transformer-based architectures, to improve accuracy and efficiency in real-time applications.

Uploaded by

Bbb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views35 pages

Report (1) .New

The document discusses the challenges and advancements in small object detection, emphasizing its importance in fields like aviation safety, autonomous driving, and satellite imaging. It highlights the limitations of existing models like YOLO and SSD in detecting small objects and introduces a Collaborative Filtering Mechanism (CFM) to enhance detection performance by integrating motion trend information. The document also reviews recent developments in object detection technologies, including the use of deep learning and transformer-based architectures, to improve accuracy and efficiency in real-time applications.

Uploaded by

Bbb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Seminar Title

Chapter - 1

INTRODUCTION
Small object detection has significant research and application values for instance, Tiny objects
such as nuts, screws, washers, nails and fuses can exist on the airport runways, and the accurate
detection of these objects could prevent major aviation accidents and economic losses. For
autonomous driving, accurate detection of small objects that could cause traffic accidents from
high-resolution scene photos of cars is essential. For satellite remote sensing images, targets in
such images may only contain a small area or even a few pixels. Small targets usually lack
sufficient appearance information relative to regular-sized targets, making it difficult to
distinguish them from background or similar targets. YOLO and SSD are popular object
detection models in the industry and have shown high sufficiency for real-time object detection.
However, their performance on small object detection datasets are still not ideal enough.
Previous research show that on the public object detection dataset MSCOCO, there is a
significant gap in detection performance between small and large targets, with the mean average
precision of small targets typically being half that of large targets. Thus, it is clear that small
object detection is still challenging. In addition, consider real world are usually intricate and
complex which often contains illumination changes, target occlusion, densely connected targets,
etc. and the effects of these factors on small target features could further increase the difficulty of
small object detection

Human eyes are more sensitive to moving objects than to stationary ones. Previous studies have
shown that our visual perception system relies on temporal and spatial resolution to recognize
objects. The perception and attention of human eyes to moving objects are higher than that of
stationary objects. In the detection of vehicle targets from cameras on drones, we try to integrate
the motion trend information into the process of object detection. Vehicle objects in conventional
drone images are usually small, and increasing their temporal resolution can improve the
performance of small object detection. To do this, we use an image-to-image translation method
to construct a prior knowledge of the Euler motion information from our collected dataset. In
order to do this, we need to answer two questions:

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 1


Seminar Title

1. How to construct a motion trend map.

2. How to integrate motion information into 2D drone image object detection.

Detecting small objects is a crucial challenge in various fields due to its significant research and
practical applications. For example, tiny objects like nuts, screws, washers, nails, and fuses on
airport runways pose serious safety risks, and accurately identifying them can help prevent
aviation accidents and economic losses. In the case of autonomous vehicles, detecting small
objects on the road is essential to avoid potential traffic hazards. Similarly, in satellite remote
sensing images, objects of interest often occupy only a few pixels, making their identification
difficult. The challenge with small object detection arises from their minimal visual details,
which makes distinguishing them from the background or similar-looking objects difficult.

Modern object detection models such as YOLO and SSD have proven effective for real-time
detection tasks. However, they still struggle with small object detection, as previous studies
indicate a noticeable performance gap between detecting small and large objects. For instance, in
the MSCOCO dataset, the mean average precision (mAP) for small objects is significantly lower
than for larger objects. This issue is further compounded by real-world conditions, where varying
lighting, object occlusion, and densely packed targets create additional obstacles for detection
algorithms.

One potential solution to improve small object detection involves leveraging motion-based
information. Studies show that human vision is naturally more attuned to moving objects than
static ones, relying on both spatial and temporal resolution to perceive them effectively. Inspired
by this, researchers aim to integrate motion trend data into object detection tasks. This approach
is particularly useful in drone-based vehicle detection, where vehicles often appear small in
captured images. Enhancing their temporal resolution by constructing motion trend maps can
improve detection accuracy. The challenge, however, lies in effectively generating motion trend
maps and incorporating this data into conventional 2D object detection models for drones.

The comparison of detection performance on the VisDrone dataset between YOLO-V5 with and
without the Collaborative Filtering Mechanism (CFM) highlights the advantages of integrating
CFM for small object detection. Small object detection presents unique challenges due to limited

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 2


Seminar Title

appearance information, background noise, and occlusions, making it difficult for conventional
object detection models to achieve high accuracy.

This is particularly evident in applications such as autonomous driving, satellite remote sensing,
and drone-based surveillance, where small targets like vehicles, debris, or tiny structural
elements must be identified with precision.

Despite the efficiency of models like YOLO and SSD in real-time detection, their performance
on small objects remains suboptimal. Studies have shown that on datasets such as MSCOCO, the
mean average precision (mAP) for small objects is significantly lower compared to larger ones.
This gap is further exacerbated by complex real-world conditions, including variable lighting,
densely packed targets, and occlusions.

The introduction of CFM in YOLO-V5 aims to address these challenges by filtering out
irrelevant background information and enhancing the focus on motion-based and spatially
relevant features. By leveraging CFM, the detection pipeline effectively improves feature
extraction, leading to higher precision and recall for small objects.

Experimental results on the VisDrone dataset demonstrate that YOLO-V5 with CFM
significantly outperforms its baseline counterpart, particularly in detecting small, occluded, and
densely connected objects.

This makes CFM-enhanced YOLO-V5 a promising approach for applications that require
accurate small object detection, such as traffic monitoring, security surveillance, and remote
sensing.

A. Constructing Motion Trend Map

The motion trend map containing motion trend information can be obtained from the
sequence frames of drone images. We designed an algorithm to calculate the difference
between adjacent frames and construct a difference matrix, which is further named motion
trend map, to indicate moving objects. Then use the image translation model to learn the
mapping from 2D image to its corresponding motion trend map. The reason we can not
directly use drone clips to compute motion trend maps in real-time is that when the drone
moves, all pixels in the frame will move simultaneously. To avoid that, the process of

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 3


Seminar Title

calculating the motion prior knowledge is carried out on near-stationary aerial clips and must
be separated from the detection branch, so we can filter out irrelevant information by
accumulating position changes, and only pay attention to the objects that actually move on
the surface. We will show more details in Section 3.

B. Motion Information Integration

After getting the motion trend map, we need to consider how to integrate it into the object
detection model so that the model can take advantage of the motion trend prior. In this paper
we propose a Collaborative Filtering Mechanism (CFM) to enable the model to filter out the
part of the feature map that does not contain motion trends, thereby improving detection
performance. CFM is inspired by the design mechanism of the Pooling layer in deep
learning. For the traditional Pooling layer, such as Max Pooling Layer, its main function is
down sampling without interfering with detection results, which means that there is
unnecessary redundant information in the feature map after convolution for feature
extraction. While in small object detection, irrelevant information usually occupies most of
the image area in fact. In reverse thinking, we found that how to filter this “redundant”
information in a more targeted manner would be a promising direction for small object
detection. The detailed design of CFM will also be explained in detail in Section 3.

Fig. 1. motion trend map generation using GAN: On the left side is the input 2D image-motion
trend map pair, and on the right side is the schematic of the PIX2PIX model, where the red block
in the center is the Self-Attention module we added.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 4


Seminar Title

MODEL DESIGN

A. Motion Trend Map Generation

In order to form the motion trend map, we can compute the pixel-level differences between
adjacent 2D frames as motion features. Consider a motion trend map as m for frame f in
frame sequence F which share the same dimensions. For the next frame fi+1 of frame fi , the
motion trend map can be formed as follows:

m (pˆ) = fi (pˆ) → fi+1 (pˆ)

where pˆ is the point’s (x, y) coordinate in frame i, further we have:

fi+1 (pˆ) = fi (pˆ) + mi (pˆ)

In this way, we can define the generation of the motion information feature map. Afterwards
we transform the pixels in m into color space using Algorithm 1, which give us RGB image
→ RGB motion trend map pairs (shown in Figure 1) that can be used for image-to-image
training.

Algorithm 1: Color-Shift Matching

Initialization :

pointcolor ← (0, 0, 0)inRGBspace

shif tx ← m[0], shif ty ← m[1]

maxx =← 640, maxy =← 640

ColorMatch :

if (shiftx ̸= 0)&(shifty ̸= 0) : then

Valuered = 256 + int(shiftx ÷ maxx × 128)

Valuegreen = 256 + int(shifty ÷ maxy × 128)

featfakeb ← E(fakeb)

pointcolor ← (Valuered, Valuegreen, 0)

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 5


Seminar Title

else

pointcolor ← (0, 0, 0)

end if

return point color

After generating the motion trend maps, we can train a image translation model to find the
connection between the RGB image to motion trend map.

Therefore, instead of the traditional U-Net structure used in the Pix2Pix model, we added a
self-attention mechanism to the bottleneck layer in the middle of the network to help better
aggregate the high level semantic information.

This process is shown in Figure 1 Fig. 2. Visualization of the effect with Collaborative
Filtering Mechanism Fig. 3.

The workflow of Collaborative Filtering Mechanism in YOLO-V5 223 Authorized licensed


use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.
Downloaded on March 10,2025 at 03:47:12 UTC from IEEE Xplore. Restrictions apply.

B. Design of Collaborative Filtering Mechanism

The core point of the Collaborative Filtering Mechanism is to use the motion trend map to
assist the 2D image in the feature extraction part of YOLO-V5 to separate out the features
that are irrelevant to the detection target. To do this, we build a parallel pipeline identical to
the YOLO-V5’s 2D image feature extraction and use the motion trend map as a mask at each
step to filter the feature maps computed from the original input image at each layer. For the
elements in the feature map formed by each step of the 2D image, we find the element from
the motion trend map at the same position: if it is not 0, keep it. While on the contrary, if the
element from the motion trend map it is 0, then clear the corresponding element of the
feature map at the same position. This process can be visualized as Figure 2 and Figure 3.
We will demonstrate the effect of CFM in the experimental section.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 6


Seminar Title

Chapter - 2

LITERATURE SURVEY
In recent years, deep neural networks based on Convolutional Neural Networks (CNNs) have
achieved remarkable success in object detection. One of the pioneering approaches was proposed
by Girshick et al., known as RCNN, which transformed the object detection problem into a
classification task. This method marked a significant breakthrough in object detection. Building
on this, Fast RCNN combined the advantages of RCNN and SPPNet, introducing ROI pooling to
address the challenge of different input scales.

Further advancements led to the development of Faster RCNN, where Ren et al. introduced a
Region Proposal Network (RPN) to reduce the computational cost of generating candidate
frames. Later, Dai et al. proposed the Region-based Fully Convolutional Network (RFCN),
which replaced fully connected layers with position-sensitive score maps obtained through full
convolution. This innovation significantly improved detection speed.

Another key development was the Feature Pyramid Network (FPN) introduced by Lin et al.
Before FPN, most CNN-based detectors performed detection only at the top layer of the network.
While deep CNN features are beneficial for category identification, they do not always aid in
precise target localization. FPN addressed this by incorporating a laterally connected top-down
structure, making substantial progress in multi-scale detection tasks. Today, it is a fundamental
component of many state-of-the-art models.

Following these developments, the industry focused on creating high-performance, widely


applicable object detection models. YOLO-V5, an evolution of YOLO-V3, is one such model
that balances real-time processing with high accuracy. It allows users to select different model
sizes depending on the detection task and working environment. YOLO-V5 employs a deep
residual network to extract target features and utilizes the PANet structure for multi-scale
predictions. However, it still performs three down samplings during feature extraction, which
can lead to a loss of target feature information, making it less suitable for detecting small objects.

Despite the impressive performance of YOLO-V5, the issue of small object detection remains a
challenge due to the loss of fine-grained features during multiple down sampling operations. To

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 7


Seminar Title

address this limitation, researchers have explored various enhancements, such as incorporating
attention mechanisms, optimizing feature fusion strategies, and using higher-resolution input
images. Additionally, newer versions like YOLO-V7 and YOLO-V8 have introduced structural
improvements to further refine accuracy and efficiency in real-time detection scenarios.

Looking ahead, object detection models continue to evolve with the integration of transformer-
based architectures like Vision Transformers (ViTs) and Swin Transformers. These models
leverage self-attention mechanisms to capture long-range dependencies and enhance feature
representations. Moreover, hybrid approaches combining CNNs with transformers are being
developed to balance efficiency and accuracy.

The adoption of self-attention mechanisms in object detection has significantly improved the
ability of models to capture fine-grained spatial relationships. Traditional CNN-based
architectures rely on localized receptive fields, limiting their ability to understand long-range
dependencies in an image. The introduction of Transformer-based models, such as DEtection
TRansformers (DETR) and Swin Transformer, has allowed object detection systems to model
complex spatial correlations more effectively. These approaches leverage multi-head self-
attention to enhance feature extraction, making them particularly useful for detecting small or
occluded objects in cluttered environments, such as aerial drone images.

Another crucial innovation in modern object detection is multi-scale feature fusion, which
enhances the ability of networks to detect objects of varying sizes. Traditional detection
frameworks often struggle with objects that appear at significantly different scales within the
same image. Techniques like Feature Pyramid Networks (FPN), Path Aggregation Networks
(PANet), and BiFPN (Bidirectional Feature Pyramid Network) have been developed to address
this issue. These architectures ensure that both high-resolution and low-resolution features
contribute to the final predictions, improving overall accuracy. For real-time applications like
autonomous drones, traffic monitoring, and surveillance, multi-scale detection is essential for
identifying both large vehicles and smaller objects, such as pedestrians or debris.

The increasing demand for real-time object detection on edge devices has led to the optimization
of lightweight architectures. Many state-of-the-art models, including YOLO-V7, YOLO-V8, and

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 8


Seminar Title

MobileNet-based SSD, focus on reducing computational complexity while maintaining high


accuracy. Quantization techniques, knowledge distillation, and pruned neural networks allow
models to operate efficiently on embedded hardware, such as drones and mobile devices.
Additionally, hardware-accelerated solutions using TensorRT, OpenVINO, and Edge TPU
enable faster inference speeds with lower power consumption. These advancements make object
detection models more practical for deployment in resource-constrained environments, ensuring
both efficiency and reliability.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 9


Seminar Title

Chapter - 3

PROBLEM STATEMENT AND OBJECTIVES


Object detection is a fundamental task in computer vision, with applications in surveillance,
autonomous driving, medical imaging, industrial quality control, and robotics.

The evolution of deep learning, particularly Convolutional Neural Networks (CNNs), has led to
significant advancements in object detection models, improving accuracy and efficiency.

Landmark models such as RCNN, Fast RCNN, Faster RCNN, and YOLO have addressed many
challenges in object detection, making real-time applications feasible. However, detecting small
objects, occluded objects, and objects in complex backgrounds remains a persistent challenge.

One major limitation in existing models is the loss of fine-grained features due to multiple down
sampling operations. While architectures like Feature Pyramid Networks (FPN) and Path
Aggregation Networks (PANet) have attempted to mitigate this issue, small object detection
remains suboptimal.

Additionally, most CNN-based models rely on deep feature extraction, which is beneficial for
classification but often lacks precision in localization, particularly for small and overlapping
objects.

Another challenge is maintaining a balance between detection speed and accuracy. While models
like YOLO-V5 have demonstrated real-time performance, their reliance on multiple down
sampling operations reduces the ability to retain critical spatial information.

Furthermore, new approaches such as Vision Transformers (ViTs) and hybrid CNN-transformer
architectures are emerging, showing promising improvements in object representation and
feature extraction. However, their feasibility in real-time applications needs further exploration.

Given these challenges, there is a need for an optimized object detection framework that
improves small object detection, enhances multi-scale feature extraction, and balances
computational efficiency with accuracy.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 10


Seminar Title

This study aims to analyze existing detection models, explore new architectures, and propose an
improved detection framework to overcome current limitations Small object detection remains
one of the most difficult problems in computer vision. In many real-world scenarios, objects like
screws, nuts, or pedestrians can occupy only a small portion of the image, making them difficult
to identify.

Conventional models struggle to detect such small objects because they lose vital spatial
information during down-sampling operations. Additionally, the limited appearance of small
objects leads to challenges in distinguishing them from the background or other similar objects,
which often results in high false-negative rates. Even with advanced models such as YOLO and
SSD, the accuracy of small object detection still lags behind that of larger objects, highlighting a
significant gap in the current state of the art.

While many modern object detection models excel at detecting larger objects, the trade-off
between speed and accuracy becomes more pronounced when dealing with small objects. Real-
time applications such as autonomous driving or surveillance systems require fast.

efficient processing without compromising detection quality. Many current models, including
YOLO-V5, sacrifice accuracy for speed due to their reliance on multiple down-sampling layers.
This results in a reduction of fine-grained details essential for detecting small objects.

The challenge lies in designing an architecture that not only maintains real-time performance but
also improves the accuracy of small object detection, especially in dynamic and cluttered
environments where objects may be occluded or partially visible.

Emerging Technologies in Object Detection: Vision Transformers and Hybrid Models


The introduction of Vision Transformers (ViTs) in object detection has opened new avenues for
improving feature extraction and representation. Unlike traditional CNNs, which rely on
hierarchical feature extraction, ViTs focus on self-attention mechanisms, enabling the model to
capture long-range dependencies between pixels.

This approach enhances the ability to detect small objects by preserving contextual information
that is often lost in CNN-based architectures.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 11


Seminar Title

Hybrid models that combine the strengths of both CNNs and transformers are also gaining
popularity, offering a balance between feature extraction speed and accuracy. However, despite
their potential, these models face challenges in real-time applications, where computational
efficiency and fast inference times are critical.

Further research is needed to optimize these models for use in real-world environments where
both accuracy and speed are essential.

OBJECTIVES

1. Review the Evolution of Object Detection Models

o Analyze the transition from traditional object detection techniques to deep


learning-based approaches.

o Compare the performance of models such as RCNN, Fast RCNN, Faster RCNN,
YOLO, and SSD in different detection scenarios.

o Explore the impact of transfer learning and pre-trained models on the evolution of
object detection.

o Examine the role of one-stage and two-stage detectors in balancing speed and
accuracy for different applications.

2. Identify Challenges in Small Object Detection

o Investigate the impact of multiple downsampling operations on feature loss.

o Analyze the limitations of existing models in detecting small, occluded, or


overlapping objects.

o Evaluate the effectiveness of current feature extraction techniques in preserving


fine details.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 12


Seminar Title

o Investigate hybrid approaches that combine CNNs and transformers for better
detection accuracy

3. Optimize Feature Extraction and Multi-Scale Detection

o Study advanced feature extraction techniques such as Feature Pyramid Networks


(FPN) and Path Aggregation Networks (PANet).

o Explore attention mechanisms and transformer-based approaches to improve


feature representation.

o Develop strategies to enhance spatial information retention without increasing


computational complexity.

4. Improve Real-Time Object Detection Performance

o Optimize model architectures to reduce computational overhead while


maintaining high accuracy.

o Evaluate trade-offs between detection speed and precision across different object
scales.

o Implement techniques such as quantization and pruning to improve efficiency.

5. Explore Transformer-Based Object Detection Approaches

o Analyze the role of Vision Transformers (ViTs) in object detection.

o Compare CNN-based and transformer-based models to assess their strengths and


weaknesses.

o Investigate hybrid approaches that combine CNNs and transformers for better
detection accuracy.

6. Develop a Robust Object Detection Framework

o Propose an improved model that integrates multi-scale feature extraction and real-
time efficiency.

o Validate the proposed model on benchmark datasets and real-world scenarios.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 13


Seminar Title

o Compare its performance with existing state-of-the-art object detection models.

7. Leverage Self-Attention Mechanisms for Enhanced Feature Extraction

o Study the impact of self-attention in improving spatial feature relationships.

o Implement self-attention in PIX2PIX-based models to refine object


representation.

Compare the effectiveness of self-attention-enhanced models with traditional


CNN-based detection.

o Compare CNN-based and transformer-based models to assess their strengths and


weaknesses.

8. Integrate Motion Trend Maps for Improved Drone-Based Detection

o Utilize motion trend maps to filter out irrelevant background noise in aerial
imagery.

o Explore motion-based feature enhancement techniques to improve detection of


small moving objects.

o Assess the impact of integrating motion-aware priors on model performance.

9. Benchmark Performance on Drone-Specific Datasets

o Conduct evaluations on datasets like VisDrone, UAVDT, and DOTA to test


model generalizability.

o Analyze the effectiveness of various architectures under real-world drone


surveillance conditions.

o Compare performance metrics, including mean Average Precision (mAP),


precision, and recall.

10. Optimize Object Detection for Edge Deployment

o Implement lightweight models suited for real-time processing on resource-


constrained devices.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 14


Seminar Title

o Utilize hardware accelerators like TensorRT, OpenVINO, and Edge TPU for
faster inference.

o Explore energy-efficient architectures to extend the deployment time of battery-


powered drones.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 15


Seminar Title

Chapter - 4

METHODOLOGY

1. Collaborative Filtering Mechanism (CFM):

o CFM integrates a motion trend map into the feature extraction process of YOLO-V5
to eliminate irrelevant background noise and enhance object detection accuracy.

o The motion trend map functions as an adaptive filter, allowing only motion-relevant
features to be retained while discarding static or less informative elements.

o This mechanism is particularly useful for small object detection in aerial imagery,
where distinguishing between moving and stationary objects is a significant
challenge.

2. Parallel Pipeline Processing:


o A parallel processing pipeline is designed to run alongside YOLO-V5’s standard
feature extraction, ensuring the motion trend map is applied at every layer for
consistency.

o At each stage, if a feature map position corresponds to a zero-value in the motion


trend map, it is cleared, ensuring the model prioritizes moving objects.

o This approach significantly enhances the ability to track and detect small objects,
making it more effective for drone-based vision applications.

3. Self-Attention Mechanism:

o Self-attention is incorporated within PIX2PIX to enhance feature representation by


allowing the model to focus on important regions during detection.

o This mechanism learns spatial dependencies, refining how the network processes
different regions of an image and improving detection precision.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 16


Seminar Title

o The integration of self-attention results in improved recall and precision scores,


helping the model differentiate between objects with similar features.

4. GAN-based Motion Priors Estimation:

o Generative Adversarial Networks (GANs) are used to generate motion priors, which
help in learning the temporal movement of objects from video sequences.

o The self-attention modules within GANs further refine motion estimation, ensuring
the model can track and predict movement patterns more effectively.

o This enables the system to achieve higher accuracy in tracking small and fast-moving
objects, which are often missed by traditional detection models.

5. Displacement - Color Mapping Process:

o Motion displacement data is transformed into color-encoded representations, allowing


neural networks to visually interpret motion trends in a more structured manner.

o This color mapping process simplifies complex motion information, making it easier
for the model to extract temporal-based movement patterns.

o It enhances the detection of small objects moving in cluttered environments, such as


urban landscapes captured by drone cameras.

6. YOLO-V5s for Small Object Detection:

o The YOLO-V5s variant is chosen due to its lightweight architecture, making it


suitable for real-time processing on resource-limited drone hardware.

o Despite being compact, YOLO-V5s maintains high detection accuracy, making it an


ideal choice for small object recognition in aerial imagery.

o The combination of CFM and YOLO-V5s improves detection efficiency, allowing


drones to effectively analyze their surroundings with minimal computational cost.

7. Performance Evaluation on VisDrone Dataset:

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 17


Seminar Title

o The proposed CFM module is evaluated on the VisDrone dataset, a benchmark


dataset designed for drone vision tasks in urban and rural environments.

o Experimental results indicate a 10.49% improvement in mean Average Precision


(mAP 0.5) compared to conventional YOLO-V5s models.

o Additionally, the precision and recall metrics show notable gains, proving the
effectiveness of motion-based feature filtering in real-world drone applications.

8. Experimental Validation and Benchmarking:

o The proposed method is benchmarked against baseline YOLO-V5s models, both with
and without the CFM enhancement, for a comparative performance analysis.

o Results confirm that integrating CFM and self-attention mechanisms leads to a more
accurate and efficient object detection system for drone-based vision.

o These improvements establish that motion-aware feature filtering can significantly


enhance detection performance in aerial surveillance, traffic monitoring, and search-
and-rescue operations.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 18


Seminar Title

Fig. 2. Visualization of the effect with Collaborative Filtering Mechanism

Fig. 3. The workflow of Collaborative Filtering Mechanism in YOLO-V5

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 19


Seminar Title

Chapter - 5

DISCUSSIONS
The proposed Collaborative Filtering Mechanism (CFM) effectively enhances object detection
performance in drone-based vision systems. By integrating motion trend maps into the YOLO-
V5 feature extraction process, the model filters out irrelevant background features, leading to
improved detection accuracy. The use of a parallel pipeline ensures that only motion-relevant
features are retained, which is particularly useful for small object detection in dynamic
environments.

One of the key insights from the study is the impact of self-attention mechanisms in PIX2PIX.
Experimental results indicate a notable improvement in detection performance when self-
attention is incorporated, as it helps in refining feature extraction and improving precision. The
results in Table I confirm that the PIX2PIX model with self-attention outperforms the version
without it, showing an increase of 4.22% in mAP 0.5 and 3.26% in mAP 0.5:0.95.

Furthermore, the use of GAN-based motion priors estimation introduces a novel approach to
motion tracking. By employing displacement-color mapping, motion features are effectively
visualized, making it easier for the network to interpret temporal-based movement patterns. This
process proves to be beneficial in detecting small moving objects, which are often challenging
for conventional detection models.

The VisDrone dataset plays a crucial role in evaluating the model's performance. As an
established benchmark for drone vision, the dataset provides diverse urban and rural scenarios.
The experimental results in Table II demonstrate that YOLO-V5s with the CFM module
significantly outperforms the original version, achieving a 10.49% improvement in mAP 0.5 and
substantial gains in precision and recall. These findings reinforce the effectiveness of integrating
motion-based filtering techniques into traditional object detection frameworks.

However, while the proposed approach delivers promising results, there are potential challenges.
The computational cost of integrating GANs with self-attention may impact real-time
performance on resource-constrained drone systems. Additionally, the approach relies heavily on

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 20


Seminar Title

the accuracy of motion trend maps, meaning that errors in motion estimation could affect
detection performance.

Future research could focus on optimizing the computational efficiency of the CFM framework
to enable real-time deployment on edge devices, such as drones with limited processing power.
Techniques like lightweight self-attention modules or quantized GAN models could help reduce
computational overhead without significantly compromising detection accuracy. Additionally,
exploring adaptive motion trend maps that adjust dynamically based on environmental
conditions could enhance the robustness of small object detection in varying scenarios.

Beyond small object detection, the principles behind CFM could be extended to other aerial
vision tasks, such as multi-object tracking, behavior analysis, and anomaly detection. By
integrating temporal motion priors into broader deep learning architectures, drone-based systems
could become more intelligent, allowing for autonomous decision-making in surveillance,
search-and-rescue, and traffic monitoring applications. Combining CFM with additional sensor
modalities, such as thermal imaging or LiDAR, could further improve detection performance,
especially in low-light or obscured environments.

Future Directions
o Optimizing Computational Efficiency: Enhancing the efficiency of the CFM module to
make it more suitable for real-time drone applications.

o Extending to Object Tracking: Applying motion-based filtering to improve tracking


capabilities, enabling better multi-frame object persistence.

o Exploring Other Motion Estimation Techniques: Investigating alternative motion

modeling approaches to improve robustness in various environmental conditions .

o Integrating Transformer-Based Architectures: Exploring Vision Transformers (ViTs)


and Swin Transformers to enhance feature representation and improve small object
detection in drone-based vision systems.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 21


Seminar Title

o Adaptive Feature Fusion Strategies: Developing dynamic feature fusion techniques


that adjust based on object scale and motion patterns, improving detection accuracy in
complex scenes.

o Lightweight Model Deployment for Edge Computing: Designing energy-efficient


models optimized for deployment on low-power edge devices, ensuring real-time
processing on drones without sacrificing accuracy.

All the experiments conducted in this study focus on the vehicle category of the VisDrone
dataset. The VisDrone dataset is a large-scale drone-based computer vision dataset, developed by
the AISKYEYE team at Tianjin University, China. It includes 10,209 still images and 288 video
clips, totaling 261,908 frames, captured in 14 different cities with varying environments, such as
urban and rural areas, and different densities, including sparse and crowded scenes. The dataset
provides over 2.6 million manually annotated bounding boxes for objects such as pedestrians,
vehicles, bicycles, and tricycles.

Importance of Drone-Based Detection

In recent years, drones have been widely used in traffic analysis and surveillance. As a
result, automated visual data analysis from drones has become increasingly important.
Due to the computing limitations of drones, this study uses YOLO-V5s, a lightweight
version of YOLO, to ensure efficiency while maintaining detection accuracy.

Recent advancements in drone-based surveillance and traffic monitoring have led to


significant improvements in object detection methodologies.

Inspired by the human visual system, this study explores the integration of motion
information into object detection to enhance accuracy, particularly for small and occluded
objects. By leveraging the Collaborative Filtering Mechanism (CFM) in conjunction with
YOLO-V5s, we aim to filter out irrelevant background noise while emphasizing motion-
based features.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 22


Seminar Title

Performance Improvements with CFM

The study evaluates the Collaborative Filtering Mechanism (CFM), integrated with
YOLO-V5s, to enhance detection performance. From the experimental results, it was
observed that incorporating the Self-Attention mechanism led to an improvement of
4.22% in mAP 0.5 and 3.26% in mAP 0.5:0.95, compared to the standard model.

The study evaluates the Collaborative Filtering Mechanism (CFM), integrated with
YOLO-V5s, to enhance detection performance. From the experimental results, it was
observed that incorporating the Self-Attention mechanism led to an improvement of
4.22% in mAP 0.5 and 3.26% in mAP 0.5:0.95, compared to the standard model.

Comparative Analysis of YOLO-V5s with CFM

Further evaluations showed that using the CFM module significantly improved the
detection performance of YOLO-V5s on the VisDrone dataset. The mAP 0.5 score
improved by 10.49%, and similar enhancements were seen in mAP 0.5:0.95, Precision,
and Recall. These improvements confirm that the CFM module effectively enhances
small object detection for drone-based applications.

TABLE I: COMPARISON OF DETECTION PERFORMACE ON VISDRONE BETWEEN PIX2PIX WITH OR


WITHOUT SELF-ATTENTION MECHANISM ON YOLO-V5 WITH CFM.

TABLE II: COMPARISON OF DETECTION PERFORMACE ON VISDRONE BETWEEN YOLO-V5 WITH


OR WITHOUT CFM.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 23


Seminar Title

The comparison of detection performance on the VisDrone dataset highlights the impact of
incorporating the Self-Attention mechanism in a Pix2Pix-based enhancement framework for
YOLO-V5 with the Collaborative Filtering Mechanism (CFM).

The experimental results indicate that the inclusion of the Self-Attention mechanism
significantly refines feature representation, leading to improved object detection accuracy.

This enhancement is particularly beneficial for small and occluded objects, where precise feature
extraction is crucial. By leveraging Self-Attention, the model effectively distinguishes relevant
objects from background noise, resulting in superior detection performance compared to its
counterpart without Self-Attention.

The comparison of detection performance on the VisDrone dataset demonstrates that


incorporating the Self-Attention mechanism in a Pix2Pix-based enhancement framework for
YOLO-V5 with the Collaborative Filtering Mechanism (CFM) significantly improves object
detection accuracy. Self-Attention enhances feature representation by capturing long-range
dependencies and suppressing background noise, which is particularly beneficial for detecting
small and occluded objects.

Traditional convolutional models struggle with such objects due to limited receptive fields,
whereas Self-Attention allows the model to focus on relevant regions more effectively. This
integration leads to superior detection performance, improving precision and recall compared to
the standard model without Self-Attention, making it a promising approach for real-world
applications like drone-based surveillance and traffic monitoring.

The comparison of detection performance on the VisDrone dataset between YOLO-V5 with and
without the Collaborative Filtering Mechanism (CFM) demonstrates the effectiveness of CFM in
improving object detection.

Experimental results show that integrating CFM enhances feature extraction by filtering out
irrelevant background noise and emphasizing motion-based features. This leads to increased
precision and recall, particularly for small and occluded objects.

The model with CFM achieves superior detection accuracy compared to the standard YOLO-V5,
validating its potential in drone-based surveillance and traffic monitoring applications.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 24


Seminar Title

The comparison of detection performance on the VisDrone dataset between YOLO-V5 with and
without the Collaborative Filtering Mechanism (CFM) shows that CFM significantly improves
object detection. By filtering out irrelevant background noise and emphasizing motion-based
features, CFM enhances feature extraction.

This leads to improved precision and recall, especially for small and occluded objects. The
model with CFM outperforms the standard YOLO-V5, demonstrating its effectiveness in drone-
based surveillance and traffic monitoring. CFM’s ability to refine feature representation validates
its potential for real-world applications.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 25


Seminar Title

CONCLUSION
They proposed Collaborative Filtering Mechanism (CFM) introduces a novel approach to
improving small object detection in drone vision by leveraging motion-based feature
enhancement. By integrating GAN-generated motion priors and self-attention modules, we
successfully refine the feature extraction process, leading to notable improvements in object
detection performance.

The experimental results on the VisDrone dataset demonstrate the effectiveness of CFM,
showing substantial gains in mean Average Precision (mAP), precision, and recall compared to
conventional YOLO-V5s models. The inclusion of self-attention mechanisms further enhances
feature representation, allowing the model to focus on salient motion patterns critical for object
detection in dynamic aerial environments.

Future research directions may include extending CFM for real-time object tracking on drones,
optimizing computational efficiency for edge deployment, and exploring multi-modal fusion
techniques for better scene understanding in aerial surveillance and traffic analysis applications.
Our work underscores the potential of motion-aware deep learning in advancing autonomous
drone perception and monitoring systems.

The experimental evaluation on the VisDrone dataset confirms that our CFM-enhanced YOLO-
V5s significantly outperforms the baseline models, achieving notable gains in mAP, precision,
and recall. Additionally, integrating GAN-based motion priors and self-attention mechanisms
further refines feature extraction, making the model more effective in handling occlusions and
detecting small, fast-moving objects. These improvements highlight the potential of combining
motion-aware learning techniques with existing object detection frameworks to enhance drone
vision applications.

In the future, our research could be extended to multi-object tracking and real-time anomaly
detection, which are crucial for applications like surveillance, traffic monitoring, and search-and-
rescue operations. Moreover, optimizing the model for low-power edge devices would enable its
deployment on drones for real-world tasks. Further studies could explore how temporal motion

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 26


Seminar Title

features can improve other deep learning tasks, such as action recognition and behavior analysis,
making drone vision systems more intelligent and adaptive.

Their proposed Collaborative Filtering Mechanism enhances small object detection in drone
vision by incorporating motion-aware learning techniques. Unlike traditional methods that rely
solely on static image features, CFM integrates motion trend maps generated using a GAN-based
displacement-color mapping process. This approach refines feature extraction by selectively
emphasizing moving objects while suppressing irrelevant background noise. Additionally, the
inclusion of self-attention mechanisms strengthens the model’s ability to differentiate between
objects, improving detection accuracy in complex aerial environments.

Through extensive experiments on the VisDrone dataset, we demonstrate that CFM-enhanced


YOLO-V5s significantly surpasses baseline models in mean Average Precision (mAP),
precision, and recall. The ability to leverage motion information allows the model to better detect
small, fast-moving objects that would otherwise be difficult to identify. Our results indicate that
combining GAN-based motion priors with self-attention modules leads to a more robust object
detection framework, particularly suited for drone surveillance and real-time monitoring
applications.

Future research directions for CFM could include extending the approach to real-time object
tracking, anomaly detection, and behavior analysis. These capabilities are essential for
applications such as smart traffic monitoring, security surveillance, and disaster response.
Moreover, optimizing the model for edge computing would allow real-world deployment on
drones with limited computational resources. Additionally, exploring multi-modal fusion
techniques—such as integrating LiDAR, thermal imaging, or radar data—could further enhance
scene understanding, making drone vision systems more adaptive and efficient in dynamic
environments.

The Collaborative Filtering Mechanism (CFM) introduces a motion-based enhancement strategy


for small object detection in drone imagery, addressing challenges such as object occlusion,
background clutter, and varying scales. Unlike traditional detection methods that rely solely on
static image features, CFM incorporates motion trend maps generated using GAN-based image

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 27


Seminar Title

translation (CycleGAN). These motion maps help filter out irrelevant background information
and highlight moving objects, improving detection accuracy. Additionally, the self-attention
mechanism refines feature extraction by ensuring the model focuses on important motion
patterns, leading to better object differentiation and robustness in aerial imagery analysis.

Experimental evaluations on the VisDrone dataset demonstrate that the CFM-enhanced YOLO-
V5s model significantly outperforms traditional object detection approaches, achieving higher
mean Average Precision (mAP), precision, and recall. The ability to leverage motion priors
allows for better detection of small, fast-moving, or partially occluded objects, making it highly
effective for drone-based surveillance, traffic monitoring, and security applications. Future
research could explore real-time tracking, anomaly detection, and multi-modal fusion with
additional data sources (such as LiDAR or thermal imaging) to further enhance detection
accuracy and adaptability in dynamic environments.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 28


Seminar Title

REFERENCES
[1] Hayun Lee, Gyeonghwan Hong, and Dongkun Shin. Shareable camera framework for
multiple computer vision applications. In 2018 20th International Conference on Advanced
Communication Technology (ICACT), pages 669–674, 2018.

[2] Bin Cao, Meng Li, Xin Liu, Jianwei Zhao, Wenxi Cao, and Zhihan Lv. Many-objective
deployment optimization for a drone-assisted camera network. IEEE Transactions on Network
Science and Engineering, 8(4):2756–2764, 2021.

[3] Hamed Ghasemi, Amin Mirfakhar, Mehdi Tale Masouleh, and Ahmad Kalhor. Control a
drone using hand movement in ros based on single shot detector approach. In 2020 28th Iranian
Conference on Electrical Engineering (ICEE), pages 1–5, 2020.

[4] Jorgen Wallerman, Jonas Bohlin, Mats B. Nilsson, and Johan E. ¨ S. Franssen. Drone-based
forest variables mapping of icos tower surroundings. In IGARSS 2018 - 2018 IEEE International
Geoscience and Remote Sensing Symposium, pages 9003–9006, 2018.

[5] Trinadh V S N Venna, Sarosh Patel, and Tarek Sobh. Application of image-based visual
servoing on autonomous drones. In 2020 15th IEEE Conference on Industrial Electronics and
Applications (ICIEA), pages 579–585, 2020.

[6] Assem Alsawy, Alan Hicks, Dan Moss, and Susan Mckeever. An image processing based
classifier to support safe dropping for delivery-bydrone. In 2022 IEEE 5th International
Conference on Image Processing Applications and Systems (IPAS), volume Five, pages 1–5,
2022.

[7] D. Yallappa, M. Veerangouda, Devanand Maski, Vijayakumar Palled, and M. Bheemanna.


Development and evaluation of drone mounted sprayer for pesticide applications to crops. In
2017 IEEE Global Humanitarian Technology Conference (GHTC), pages 1–7, 2017.

[8] Visarut Trairattanapa, Ankit A. Ravankar, and Takanori Emaru. Estimation of tree diameter
at breast height using stereo camera by drone surveying and mobile scanning methods. In 2020
59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE),
pages 946–951, 2020.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 29


Seminar Title

[9] Omar Daniel Mora Granillo and Zizilia Zamudio Beltran. Real-time ´ drone (uav) trajectory
generation and tracking by optical flow. In 2018 International Conference on Mechatronics,
Electronics and Automotive Engineering (ICMEAE), pages 38–43, 2018.

[10] Kian Meng Yap, Kok Seng Eu, and Jun Ming Low. Investigating wireless network
interferences of autonomous drones with camera based positioning control system. In 2016
International Computer Symposium (ICS), pages 369–373, 2016.

[11] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell:
Lessons learned from the 2015 mscoco image captioning challenge. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 39(4):652–663, 2017.

[12] Sujeet Kumar, Prashant Johri, Avneesh Kumar, Sudeept Singh Yadav, and Harshit Kumar.
Multiple object detection using deep learning. In 2021 3rd International Conference on Advances
in Computing, Communication Control and Networking (ICAC3N), pages 380–384, 2021.

[13] Junying Zeng, Zuoyong Lin, Chuanbo Qi, Xiaoxiao Zhao, and Fan Wang. An improved
object detection method based on deep convolution neural network for smoke detection. In 2018
International Conference on Machine Learning and Cybernetics (ICMLC), volume 1, pages 184–
189, 2018.

[14] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan,
Piotr Dollar, and C Lawrence Zitnick. Microsoft ´ coco: Common objects in context. In
Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-
12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.

[15] Richard A Abrams and Shawn E Christ. Motion onset captures attention. Psychological
Science, 14(5):427–432, 2003.

[16] Steven Franconeri and Daniel Simons. Moving and looming stimuli capture attention.
Perception psychophysics, 65:999–1010, 11 2003.

[17] Richard A Abrams and Shawn E Christ. Motion onset captures attention. Psychological
Science, 14(5):427–432, 2003

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 30


Seminar Title

[18] Atsushi Senju, Toshikazu Hasegawa, and Yoshikuni Tojo. Does perceived direct gaze boost
detection in adults and children with and without autism? the stare-in-the-crowd effect revisited.
Visual Cognition - VIS COGN, 12:1474–1496, 11 2005.

[19] Rene Laprise. The euler equations of motion with hydrostatic pressure ´ as an independent
variable. Monthly weather review, 120(1):197–207, 1992.

[20] Aleksander Holynski, Brian L Curless, Steven M Seitz, and Richard Szeliski. Animating
pictures with eulerian motion fields. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 5810–5819, 2021.

[21] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies
for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference
on computer vision and pattern recognition, pages 580–587, 2014.

[22] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster rcnn: Towards real-time
object detection with region proposal networks. Advances in neural information processing
systems, 28, 2015.

[23] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep
convolutional networks for visual recognition. IEEE transactions on pattern analysis and
machine intelligence, 37(9):1904– 1916, 2015.

[24] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-fcn: Object detection via region-based fully
convolutional networks. Advances in neural information processing systems, 29, 2016.

[25] Adedeji Olugboja, Zenghui Wang, and Yanxia Sun. Parallel convolutional neural networks
for object detection [j]. Journal of Advances in Information Technology Vol, 12(4), 2021.

[26] Xingxing Xie, Gong Cheng, Jiabao Wang, Xiwen Yao, and Junwei Han. Oriented r-cnn for
object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision,
pages 3520–3529, 2021.

[27] Feng Shuang, Hanzhang Huang, Yong Li, Rui Qu, and Pei Li. Afe-rcnn: Adaptive feature
enhancement rcnn for 3d object detection. Remote Sensing, 14(5):1176, 2022. [28] Tsung-Yi
Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariha- ´ ran, and Serge Belongie.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 31


Seminar Title

Feature pyramid networks for object detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 2117–2125, 2017.

[29] Guiyi Yang, Zhengyou Wang, and Shanna Zhuang. Pff-fpn: A parallel feature fusion
module based on fpn in pedestrian detection. In 2021 International Conference on Computer
Engineering and Artificial Intelligence (ICCEAI), pages 377–381, 2021.

[30] Di Liu and Fei Cheng. Srm-fpn: A small target detection method based on fpn optimized
feature. In 2021 18th International Computer Conference on Wavelet Active Media Technology
and Information Processing (ICCWAMTIP), pages 506–509, 2021.

[31] Xiaoqi Yang and Liangliang Duan. Mptc-fpn: A multilayer progressive fpn with
transformer-cnn based encoder for salient object detection. IEEE Access, 10:98816–98827,
2022.

[32] Jia Li, Ruiqi Li, Chensheng Wang, and Yangguang Li. Comparative research of fpn and
mtcn in face attribute recognition. In 2019 International Conference onArtificial Intelligence and
Advanced Manufacturing (AIAM), pages 539–543, 2019.

[33] Zhiqing Li, Erzhu Li, Tianyu Xu, Alim Samat, and Wei Liu. Feature alignment fpn for
oriented object detection in remote sensing images. IEEE Geoscience and Remote Sensing
Letters, 20:1–5, 2023.

[34] Yu-Ming Zhang, Jun-Wei Hsieh, Chun-Chieh Lee, and Kuo-Chin Fan. Sfpn: Synthetic fpn
for object detection. In 2022 IEEE International Conference on Image Processing (ICIP), pages
1316–1320, 2022.

[35] Huayu Li, Shuyu Miao, and Rui Feng. Dg-fpn: Learning dynamic feature fusion based on
graph convolution network for object detection. In 2020 IEEE International Conference on
Multimedia and Expo (ICME), pages 1–6, 2020.

[36] Junhao Hu, Lei Jin, and Shenghuo Gao. Fpn++: A simple baseline for pedestrian detection.
In 2019 IEEE International Conference on Multimedia and Expo (ICME), pages 1138–1143,
2019.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 32


Seminar Title

[37] GoodfellowIan Pouget-AbadieJean MirzaMehdi XuBing WardeFarleyDavid OzairSherjil


CourvilleAaron BengioYoshua.

[38] Muhammad Haris, Greg Shakhnarovich, and Norimichi Ukita. Taskdriven super resolution:
Object detection in low-resolution images. In Neural Information Processing: 28th International
Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part V
28, pages 387–395. Springer, 2021.

[39] Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le.
Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pages 113–123, 2019.

[40] Marcus D Bloice, Peter M Roth, and Andreas Holzinger. Biomedical image augmentation
using augmentor. Bioinformatics, 35(21):4522– 4524, 2019.

[41] Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical
automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition workshops, pages 702–703, 2020.

[42] Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin
Ling. Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and
Machine Intelligence, pages 1–1, 2021.

[43] Tianqu Zhao and Hong Jiang. Landing system for ar.drone 2.0 using onboard camera and
ros. In 2016 IEEE Chinese Guidance, Navigation and Control Conference (CGNCC), pages
1098–1102, 2016.

[44] Beatriz Hernandez-Hernandez, Jose Martinez-Carranza, and Jose Rangel-Magdaleno.


Keeping a moving target within the field of view of a drone’s onboard camera via stochastic
estimation. In 2017 Workshop on Research, Education and Development of Unmanned Aerial
Systems (RED-UAS), pages 150–155, 2017.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 33


Seminar Title

[45] Boguslaw Cyganek and Kazimierz Wiatr. Design of a visual frontend for parallel signal
processing on underwater search drone. In 2018 IEEE Intl Conf on Parallel Distributed
Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud
Computing, Social Computing Networking, Sustainable Computing Communications
(ISPA/IUCC/BDCloud/SocialCom/SustainCom), pages 1046–1047, 2018.

[46] Cheng-Fang Peng, Jun-Wei Hsieh, Shao-Wei Leu, and Chi-Hung Chuang. Drone-based
vacant parking space detection. In 2018 32nd International Conference on Advanced Information
Networking and Applications Workshops (WAINA), pages 618–622, 2018.

[47] Hakan Kayan, Raheleh Eslampanah, Faezeh Yeganli, and Murat Askar. Heat leakage
detection and surveiallance using aerial thermography 225 Authorized licensed use limited to:
J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on March
10,2025 at 03:47:12 UTC from IEEE Xplore. Restrictions apply. drone. In 2018 26th Signal
Processing and Communications Applications Conference (SIU), pages 1–4, 2018.

[48] Maitha Al Shamsi, Mohammed Al Shamsi, Rashed Al Dhaheri, Rashed Al Shamsi, Saif Al
Kaabi, and Younes Al Younes. Foggy drone: Application to a hexarotor uav. In 2018 Advances
in Science and Engineering Technology International Conferences (ASET), pages 1–5, 2018.

[49] Yanan Xu, Dexiang Yao, Xue Ren, and Yunhai Dai. Intelligent black ice detection and alert
system using thermal imaging camera and drone. In 2021 IEEE 23rd Int Conf on High
Performance Computing Communications; 7th Int Conf on Data Science Systems; 19th Int Conf
on Smart City; 7th Int Conf on Dependability in Sensor, Cloud Big Data Systems Application
(HPCC/DSS/SmartCity/DependSys), pages 2328–2331, 2021.

[50] Noriyasu Yamamoto and Noriki Uchida. Improvement of image processing for a
collaborative security flight control system with multiple drones. In 2018 32nd International
Conference on Advanced Information Networking and Applications Workshops (WAINA),
pages 199–202, 2018.

[51] Andres Erazo, Eduardo Tayupanta, and Seok-Bum Ko. Epipolar geometry on drones
cameras for swarm robotics applications. In 2020 IEEE International Symposium on Circuits and
Systems (ISCAS), pages 1–5, 2020.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 34


Seminar Title

[52] Lin Meng, Takuma Hirayama, and Shigeru Oyanagi. Underwater-drone with panoramic
camera for automatic fish recognition based on deep learning. IEEE Access, 6:17880–17886,
2018.

[53] Kazi Mahmud Hasan, Wida Susanty Suhaili, S. H. Shah Newaz, and Md. Shamim Ahsan.
Development of an aircraft type portable autonomous drone for agricultural applications. In 2020
International Conference on Computer Science and Its Application in Agriculture (ICOSICA),
pages 1–5, 2020.

[54] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep
convolutional networks for visual recognition. IEEE transactions on pattern analysis and
machine intelligence, 37(9):1904– 1916, 2015.

[55] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-fcn: Object detection via region-based fully
convolutional networks. Advances in neural information processing systems, 29, 2016.

Department of Artificial Intelligence & Machine Learning, Acharya Institute of Technology 35

You might also like