Nhóm 1 - Task 4
Nhóm 1 - Task 4
A Fast and Accurate Few-Shot Detector for Adaptive meta-knowledge transfer network for
Objects with Fewer Pixels in Drone Image (March few-shot object detection in very high resolution
2021) remote sensing images (March 2024)
Motivation: Many state-of-the-arts are not suitable Motivation: FSOD faces challenges like domain
for drone images due to the particularity of shifts, misclassification due to class similarities, and
perspective and large number of small targets. limited information from few samples.
Method: This study proposes a fast few-shot Method: Proposing the Adaptive Meta-Knowledge
detector for drone targets. The network consists of Transfer Network (AMTN) for Few-shot object
two branches: the support branch utilizes features detection. It has two stages: training with rich data
from the support images containing detected objects from the Base class and limited, randomly labeled
to optimize for the initial target model, while the query data from both Base and Novel classes. The network
branch contains images containing objects to be utilizes the Spatial-Frequency Joint Enhancement
detected. It constructs DenseNet as a backbone for (SFJE) model to enhance query image features
feature extraction. In the support branch, Multi-scale across spatial and frequency domains. ResNet101
Feature maps are generated from the support branch extracts features from query and support sets to
and then undergo two operations: deconvolution and generate a query feature map and multiple support
element-wise product to create a single feature map feature maps. These undergo MAP to one channel,
with the size of the low-level feature map. This power function to enhance strong feature, and
feature map undergoes iterative optimization for the correlation computation to create a similarity feature
initial target model. The optimized target model map. Softmax, Sum, and Sigmoid generate weights,
combines this feature map with the feature maps from and the query feature map is transformed to the
the query branch to create an Attention feature map. frequency domain, then back to a new feature map.
This Attention feature map is then used for regression This, along with the weights, forms a Joint
calculation, matching score maps, and classification Enhancement feature, processed by RPN for
to generate the final predictions. proposals and ROI Align for bounding boxes.
Few-shot object detection on aerial imagery via Transformation-Invariant Network for Few-Shot
deep metric learning and knowledge inheritance Object Detection in Remote Sensing Images
(August 2023) (Nov 2023)
Motivation: Object detection is vital for analyzing Motivation: The substantial scale and orientation
aerial imagery, but traditional CNN-based methods variations of objects in remote sensing images pose
rely on large labeled datasets, which are costly and significant challenges to existing few-shot object
time-consuming to acquire and annotate. detection methods.
Method: The proposed fine-tuned model consists of Method: introducing a Transformation-Invariant
two stages. In Stage 1, a large number of images with Network (TINet). This network has two branch:
full labels from the base set are used as input. support branch and query branch. The support
Features are extracted using ResNet101, creating branch consists of images from the support set and
multi-scale feature maps via FPN and generating binary masks representing the object positions. They
proposals through RPN. These proposals and are feature-extracted using ResNet50 and
multi-scale feature maps undergo ROI Pooling for ResNet101 as a backbone, producing support feature
encoding, and ROI feature extractor for feature maps with a single-channel dimension via MAP
extraction. This stage predicts box classification and operation. In the query branch, images from the query
localization to determine object positions. In Stage 2, sets and their transformed version are
feature-extracted through the backbone. Features
a small random subset of labeled samples from both
from the original images pass through the RPN to
the base and novel sets is used for further training.
create region proposals, and these proposals are
After ROI Pooling, the data is split into Base and
then transformed to match the transformed images.
Novel heads. The Novel head employs the Both the original and transformed proposals undergo
Multi-Similarity encoding model to compute similarity RoI Align to create feature maps, which are combined
and enhance performance before making predictions. with the single-channel feature maps from the support
Predictions from the Base and Novel heads are branch to form aggregated feature maps. These
compared using a Coherence Loss function. aggregated feature maps serve as input to the RoI
head for object position prediction.
Thanh:
Few-Shot Object Detection in Remote Sensing IMPROVING FEW-SHOT OBJECT DETECTION
Imagery via Fuse Context Dependencies and WITH OBJECT PART PROPOSALS (Oct 23, 2023)
Global Features (July 2023)
Motivation: One of the main challenges in effectively
Motivation: The increasing abundance of remote utilizing Remote Sensing Imagery (RSI) for object
sensing images, coupled with the lack of manual detection tasks is the necessity to identify highly
annotations for objects, has limited the widespread specific object classes, often not present in the base
adoption of strongly supervised deep learning model's training data. These classes are typically
methods for object detection. This limitation arises challenging to characterize without domain expertise
due to their inadequate performance when and are costly to obtain in large quantities. Hence,
encountering unseen object categories there is a pressing demand for object detection
Method: The proposed model comprises three main techniques that can perform well with limited labeled
modules: The Meta-feature Extractor: This examples, operating in a low-data regime.
component is responsible for extracting multi-scale Method: The proposed model aims to learn object
meta-features from the query set. It leverages parts. Objects in the input image are pre-detected,
YOLOv5 as a backbone to perform this feature and their parts are fed into a shared feature extractor.
extraction. YOLOv5 is well-known for its efficiency in RCNN is used to extract features, and FPN is
object detection tasks, making it suitable for employed to create multi-scale feature maps. These
extracting meta-features from the query images feature maps undergo RPN for proposal generation
across different scales. The Reweighting Module: and RoI Pooling for feature extraction from the
Here, multiscale global features are extracted from proposals. These features then serve as input to the
the support set using a Global Feature Pyramid Regression and Classification heads based on
Extractor (GFPE). Each feature is then divided into Contrastive. A feature Queue is built to store
GE (Global Enhancement) blocks, which are
previously learned knowledge to optimize predictions
optimized to capture relevant information. These GE
and prevent model forgetting.
blocks are subsequently combined to form new
feature maps. These feature maps are then fused
with the meta-features extracted from the query set,
resulting in Fused Feature Maps. The Feature Fusion
Module: Finally, the Fused Feature Maps are used as
inputs to the Prediction Layer for object detection.
The Prediction Layer utilizes these feature maps to
detect objects in the input images
Thành:
CrackNex: a Few-shot Low-light Crack Text Semantic Fusion Relation Graph
Segmentation Model Based on Retinex Theory Reasoning (TSF-RGR) for Few-Shot Object
for UAV Inspections (2024) Detection on Remote Sensing Images (2023)
Motivation: Motivation:
Encountering difficulties in segmenting cracks under Addressing the issue of excessive dependence on the size
conditions such as low light conditions... (due to poor of the training data in most models.
contrast between the cracks and the surrounding walls). Method:
Method: designing a relation graph learning module. First, input
Proposing CrackNex framework. First, both query and images are extracted to create features through a
support (Q&S) images are used to generate reflectance and backbone and those features will through a RPN to
illumination features by using Decompose Net. Then, two generate enormous region proposals. Then, a TSE module
backbones CNN are used on support reflectance and will encode word embeddings for all class labels of those
illumination images to generate feature MAPs (support (P) features. In TSE, the paper introduces a corpus, which can
and reflectance(Pr) prototype), the paper uses PFM fusing compensate for the lack of visual features. After that, a
these 2 prototypes through co-attention mechanism to RGL module will learn the relation features for each region
create P' and P'r. The other two backbones CNN are used to proposal by constructing different relation types (semantic
create reflectance query features, utilized as low-level and spatial relation) with the help of GGNN. Following
features in the ASPP module to create novel query features that, the JRR module aggregates relational features and
(F'q). Note that those backbones share weights with the visual information into the classification and bounding box
former backbones in pairs. The P' is then fed into SSP with regression layers. Finally, a two-stage fine-tuning training
F'q to create an augmented prototype (P''). Finally, the process is introduced.
paper computes the cosine distance and estimates a
similarity map between P'' and F'q to generate final
predictions.
FSOD4RSI: Few-Shot Object Detection for Few-Shot Object Detection in Remote Sensing:
Remote Sensing Images via Features Lifting the Curse of Incompletely Annotated
Aggregation and Scale Attention (2024) Novel Objects (2024)
Motivation: Motivation:
Identify methods to address factors (Object Scale, Object Addressing the issue of Inadequately Annotated Novel
Pixels, and Object Range) hindering the utilization of Objects (IANOs) in the context of remote sensing.
methods designed for remote sensing images. Method:
Method: A ST-FSOD method is introduced with two major
Design two modules (FAM and SAM) to create a components: the ST-RPN and the self-training bounding
developed FSOD model. First, query images and support box head (ST-BBH). First, images with novel objects will
images (including masks) are extracted to create features be extracted in backbone and FPNs head to create
through feature extractor and reweight Net, respectively. multilevel features . Then, the ST-RPN module takes
Then, by employing the transformer encoder, fusing with those multilevel features as inputs and generates two sets
FPN, the FAM jointly encodes support and query features, of object proposals, corresponding to base and novel
leveraging channel multiplication to enhance the proposals, corresponding to base and novel categories,
aggregation of these features. After that, a SAM module respectively. After that, these proposals are merged and
(designed) is used to acquire diverse context information fed into RoI pooling layers to extract features and obtain
through distinct convolution operations. This enables the RoI (region of interest) features. Finally, the ST-BBH
aggregation of the obtained contextual details, which module takes RoI features as inputs and produces final
guides the network to adapt to objects of varying scales detection results. Specifically, it detects potential
and emphasizes regions containing richer information. The unannotated novel class objects and uses them as
object detection post-processing step incorporates the pseudo-labels to recall more novel class objects, thereby
Soft-NMS algorithm, which is tailored to address the improving the model's performance.
characteristics of dense targets commonly encountered in
remote sensing scenes.
Thành:
In defense of local descriptor-based few-shot
object detection (2024)
Motivation:
To enhance the performance and generalization ability of
object detection in images.
Method:
First, the paper introduces a feature presentation
aspect, With the dense sampled image patches, they
can then utilize conventional intersection and
concurrency ratio (IoU) to quantify whether or not the
patch (partially) covers the object for following the
supervised learning stage. Specifically, given pixel or
bounding box-based annotations, the paper labels a
patch as positive if the IoU is greater than a threshold
t. In this way, the paper can obtain a group of binary
patch-based label masks as well as latent contextual
information (illustrated in the next subsection) from
each query image, and thus, the limited queries are
fully utilized. Then, Hog is used to represent the
original image patches. The second aspect is feature
learning, which learns contextual information among
patches guided by Kernel-InfoNCE loss. Finally, the
final aspect, Object inference is introduced to predict
object presence with cosine similarity measure.
Trang:
Few-Shot Object Detection Based on Contrastive Few-Shot Object Detection With Self-Adaptive
Class-Attention Feature Reweighting for Remote Global Similarity and Two-Way Foreground
Sensing Images (27 December 2023) Stimulator in Remote Sensing Images (31 August
Motivation: Addressing the challenge of limited 2022)
labeled samples in remote sensing image object Motivation: Address challenges faced by previous
detection. few-shot detection method: spatial similarity of
Method: support-query features are ignored, the feature
The paper introduces a remote sensing Few-Shot attention operation is performed in a unidirectional
Object Detection (FSOD) model addressing the manner.
challenges of interclass blurring and limited novel Method:
class samples. The model enhances attention to The proposed method aims to improve Few-Shot
instance-level ROI features through contrastive Object Detection (FSOD) in remote sensing imagery
learning and class-attention feature weighting. The by leveraging spatial information and support features
model architecture consists of a backbone network, more effectively. The framework consists of two main
RPN for proposal generation, and comprehensive innovations: the Self-Adaptive Global Similarity
detection module for category prediction and (SAGS) module with Background Suppression (BS)
bounding box regression. Importantly, the backbone and the Two-Way Foreground Stimulator (TFS)
and RPN networks are class-independent, enabling module. In the SAGS module, a multiscale detection
transfer learning to novel classes without extensive approach is employed to handle the scale
retraining. Through contrastive learning with a diversification of objects in remote sensing images
distinguishing operator and remodeling attention (RSIs). The BS technique subtract background
network, the model enhances discriminative feature features from the query image's original features.
representation for better differentiation between base Then, the spatial relation and detail embedding
and novel classes. The proposed training strategy features are encoded and compared between the
involves creating a base detector, fine-tuning for query and support images using SAGS. The TFS
novel class detection, and applying module enhances feature fusion by introducing
attention-weighted features during testing to achieve two-way attention to detail embeddings based on the
accurate object localization and classification. similarity map.
Few-Shot Aircraft Detection in Satellite Videos Few-Shot Object Detection on Remote Sensing
Based on Feature Scale Selection Pyramid and Images via Shared Attention Module and
Proposal Contrastive Learning (14 September Balanced Fine-Tuning Strategy (23 September
2022) 2021)
Motivation: Address the challenges in identifying Motivation: The limited performances of recent
foreground objects in satellite videos. few-shot object detection methods on remote sensing
Method: images.
The proposed method involves a two-stage process: Method:
base training followed by fine-tuning with a focus on This paper proposes a novel method for few-shot
overcoming challenges such as small object size, object detection on remote sensing images. The key
poor distinguishability, and domain shift between contributions include a Shared Attention Module
base and novel classes. The method introduces a (SAM) and a Balanced Fine-Tuning Strategy (BFS).
Feature Scale Selection Pyramid Network (FSSPN) SAM extracts multi-dimensional attention maps from
to address issues related to feature fusion and base classes during base training, aiding in focused
contextual attention, enhancing object detection feature extraction during few-shot fine-tuning. BFS
performance. Additionally, a Proposal Contrastive mitigates sample imbalance between base and novel
Learning component is incorporated into the loss classes. It also introduces a Balanced L1 loss
function to improve robustness in feature function to boost the influence of novel class objects
representation for few-shot cases. The method's on the loss, addressing misclassification issues. The
effectiveness is validated through experiments using combination of SAM and BFS improves the accuracy
different K values and comparisons with existing of few-shot object detection on remote sensing
approaches. Overall, the proposed method offers a images. This method demonstrates promising results
comprehensive solution for improving few-shot object in detecting novel class objects with only a few
detection in satellite videos, particularly for small and annotated samples.
poorly distinguishable objects.
Trang:
Few-shot object detection based on global
context and implicit knowledge decoupled head
(21 January 2024)
Motivation: Address the challenges in remote
sensing image object detection caused by the slow
acquisition cycle and difficulties in labeling.
Method:
Focus on data preprocessing and network
architecture enhancements. Leveraging YOLOv7 as
the baseline, the approach introduces innovations in
both data and network domains. Data preprocessing
involves image segmentation and a generation model
for data augmentation, enhancing the probability of
successful detection of small targets in
high-resolution images. The network architecture
incorporates a GC block in the neck section and a
decoupled head with implicit knowledge, aiming to
improve generalization performance and
interpretability. Additionally, attention enhancement
strategies such as the GC block and learning
strategies are employed to improve model
performance further. Through meticulous design and
innovative techniques, the proposed method offers a
promising approach for optimizing object detection
models for challenging scenarios with few-shot
high-resolution images.Experimental results on
benchmark datasets demonstrate the effectiveness of
the proposed method.
Thục:
Integrative Few-Shot Classification and Few-shot Object Detection on Remote Sensing
Segmentation for Landslide Detection(2022) Images(2020)
Motivation:There has been an ongoing demand for Motivation:current CNNbased methods mostly
monitoring landslides due to the heavy economic require a large number of annotated samples to train
losses and casualties caused by such natural deep neural networks and tend to have limited
disasters. generalization abilities for unseen object categories.
Method: Method:
Propose an automatic annotation procedure to create Few-shot object detection model (FSODM) is
a new landslide dataset consisting of 2963 images, designed to leverage the meta-knowledge from the
termed the LandslidePTIT dataset. The proposed dataset of base classes.A Meta Feature Extractor
method termed Cross Feature and Attentive Squeeze module is first developed to learn meta-features at
Network (CF-ASNet). We first extract feature maps of three different scales from input query images. Then
a query image and a support image from a ResNet50 a Reweighting Module takes as input N support
and ResNet101 trained on ImageNet as a backbone images with labels, one for each class, and outputs
network.Each feature maps pairs with the same level three groups of N reweighting vectors, one for each
are then used to construct hypercorrelations. scale. These reweighting vectors are used to
Secondly, the model then learns to transform the recalibrate the metafeatures of the same scale
correlation through an ASBlock whose details are through a channel-wise multiplication. With the
presented by gradually squeezing the support
reweighting module, the meta-information from
dimension on each query dimension, yielding the
support samples is extracted and used to amplify
high-level hypercorrelations that are later employed
to produce the mask prediction map.Finally, in the those metafeatures that are informative for detecting
producing process, two adjacent correlations are cross novel objects in the query images. The reweighted
featured using a network. Each high-level correlation meta-features are then fed into three independent
tensor pair after processing results in a feature map, bounding box detection modules to predict the
is upsampled and combined with the same query objectness scores (o), the bounding box locations,
dimension size correlation using BSlayer. The earliest and sizes (x, y, w, h) and class scores (c) at three
feature map is fed to a convolutional decoder, which different scales.
consists of bi-linear upsampling and interleaved 2D
convolution that map the number of dimensional
channels to 2 (including foreground and
background).
DMnet: A New Few-Shot Framework for Wind Few-Shot Object Detection With Self-Adaptive
Turbine Surface Defect Detection(2022) Attention Network for Remote Sensing
Motivation:Improving the ability to detect defects on Images(2021)
the surface of wind turbine blades under conditions Motivation:
of training with limited data. Due to some cases where only limited data are
Method: available.
Construct a new few-shot training framework called
DMnet for wind turbine defect detection .To absorb Method:
prior knowledge, the machine learns from a large
number of tasks to obtain high-level generalization The architecture of our proposed few-shot object
capability. In each task, the pre-processed image detector based on faster R-CNN. Firstly, support
data undergoes feature extraction by CNN to get a images are extracted into features with feature
deep feature map, which is finally input to the metric extractor. Similarly, the query image is processed and
module for category determination. Further, the after that, the RoI features are obtained with RPN as
proposed dynamic activation mapping strategy well as RoI align procedure. Secondly, through several
monitors this process and provides real-time shared fully-connected layers, the features are
feedback and corrections. After that, the machine processed into high-dimensional vectors. Then these
becomes a more powerful learner. For our target task, vectors are fused with information from support
i.e., WT surface defect detection, defect recognition images according to relation graph through Relation
and location on unseen samples can be achieved GRU. Eventually, the vectors fused with self-adaptive
with just only a small amount of supervised sample attention are used to obtain detection results through
fine-tuning. predict-head.
Trường:
Insulator Anomaly Detection Method
Based on Few-Shot Learning
Motivation: Unmanned aerial vehicles (UAVs)
assume the role of human inspectors for high-voltage
power transmission lines.
Method:
In the initial phase, we employ a meticulous insulator
positioning-restoration-cropping approach, leveraging
the precision of Retinanet for insulator localization.
Subsequently, we identify the primary region of
interest (RoI) by selecting the largest bounding box
detected by Retinanet. Moving into the second phase,
where limited training data poses challenges of
overfitting, we employ the innovative few-shot
learning methodology to mitigate this issue by
harnessing existing knowledge. DarkNet-53 serves as
our backbone for processing insulator images,
complemented by a meta feature extractor and a
multi-scale weight generator. This ensemble
facilitates feature reweighting crucial for few-shot
object detection. The multi-scale weight generator,
intricately structured with FPN and three miniature
CNNs, meticulously recalibrates the feature map
generated by the meta feature extractor, accentuating
meta-features vital for novel object detection. Finally,
we present bounding boxes outlining the largest
insulator string within the image, along with any
anomalous segments.