Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (190)

Search Parameters:
Keywords = remote sensing image annotation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 9871 KiB  
Article
AIR-POLSAR-CR1.0: A Benchmark Dataset for Cloud Removal in High-Resolution Optical Remote Sensing Images with Fully Polarized SAR
by Yuxi Wang, Wenjuan Zhang, Jie Pan, Wen Jiang, Fangyan Yuan, Bo Zhang, Xijuan Yue and Bing Zhang
Remote Sens. 2025, 17(2), 275; https://doi.org/10.3390/rs17020275 - 14 Jan 2025
Viewed by 425
Abstract
Due to the all-time and all-weather characteristics of synthetic aperture radar (SAR) data, they have become an important input for optical image restoration, and various cloud removal datasets based on SAR-optical have been proposed. Currently, the construction of multi-source cloud removal datasets typically [...] Read more.
Due to the all-time and all-weather characteristics of synthetic aperture radar (SAR) data, they have become an important input for optical image restoration, and various cloud removal datasets based on SAR-optical have been proposed. Currently, the construction of multi-source cloud removal datasets typically employs single-polarization or dual-polarization backscatter SAR feature images, lacking a comprehensive description of target scattering information and polarization characteristics. This paper constructs a high-resolution remote sensing dataset, AIR-POLSAR-CR1.0, based on optical images, backscatter feature images, and polarization feature images using the fully polarimetric synthetic aperture radar (PolSAR) data. The dataset has been manually annotated to provide a foundation for subsequent analyses and processing. Finally, this study performs a performance analysis of typical cloud removal deep learning algorithms based on different categories and cloud coverage on the proposed standard dataset, serving as baseline results for this benchmark. The results of the ablation experiment also demonstrate the effectiveness of the PolSAR data. In summary, AIR-POLSAR-CR1.0 fills the gap in polarization feature images and demonstrates good adaptability for the development of deep learning algorithms. Full article
Show Figures

Graphical abstract

31 pages, 2895 KiB  
Review
When Remote Sensing Meets Foundation Model: A Survey and Beyond
by Chunlei Huo, Keming Chen, Shuaihao Zhang, Zeyu Wang, Heyu Yan, Jing Shen, Yuyang Hong, Geqi Qi, Hongmei Fang and Zihan Wang
Remote Sens. 2025, 17(2), 179; https://doi.org/10.3390/rs17020179 - 7 Jan 2025
Viewed by 418
Abstract
Most deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a single foundation [...] Read more.
Most deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a single foundation model enables zero-shot predictions on various vision tasks. The above advantages make foundation models better suited for remote sensing images, where image annotations are more sparse. However, the inherent differences between natural images and remote sensing images hinder the applications of the foundation model. In this context, this paper provides a comprehensive review of common foundation models and domain-specific foundation models for remote sensing, and it summarizes the latest advances in vision foundation models, textually prompted foundation models, visually prompted foundation models, and heterogeneous foundation models. Despite the great potential of foundation models for vision tasks, open challenges concerning data, model, and task impact the performance of remote sensing images and make foundation models far from practical applications. To address open challenges and reduce the performance gap between natural images and remote sensing images, this paper discusses open challenges and suggests potential directions for future advancements. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Graphical abstract

23 pages, 2311 KiB  
Article
Semi-Supervised Change Detection with Data Augmentation and Adaptive Thresholding for High-Resolution Remote Sensing Images
by Wuxia Zhang, Xinlong Shu, Siyuan Wu and Songtao Ding
Remote Sens. 2025, 17(2), 178; https://doi.org/10.3390/rs17020178 - 7 Jan 2025
Viewed by 450
Abstract
Change detection (CD) is an important research direction in the field of remote sensing, which aims to analyze the changes in the same area over different periods and is widely used in urban planning and environmental protection. While supervised learning methods in change [...] Read more.
Change detection (CD) is an important research direction in the field of remote sensing, which aims to analyze the changes in the same area over different periods and is widely used in urban planning and environmental protection. While supervised learning methods in change detection have demonstrated substantial efficacy, they are often hindered by the rising costs associated with data annotation. Semi-supervised methods have attracted increasing interest, offering promising results with limited data labeling. These approaches typically employ strategies such as consistency regularization, pseudo-labeling, and generative adversarial networks. However, they usually face the problems of insufficient data augmentation and unbalanced quality and quantity of pseudo-labeling. To address the above problems, we propose a semi-supervised change detection method with data augmentation and adaptive threshold updating (DA-AT) for high-resolution remote sensing images. Firstly, a channel-level data augmentation (CLDA) technique is designed to enhance the strong augmentation effect and improve consistency regularization so as to address the problem of insufficient feature representation. Secondly, an adaptive threshold (AT) is proposed to dynamically adjust the threshold during the training process to balance the quality and quantity of pseudo-labeling so as to optimize the self-training process. Finally, an adaptive class weight (ACW) mechanism is proposed to alleviate the impact of the imbalance between the changed classes and the unchanged classes, which effectively enhances the learning ability of the model for the changed classes. We verify the effectiveness and robustness of the proposed method on two high-resolution remote sensing image datasets, WHU-CD and LEVIR-CD. We compare our method to five state-of-the-art change detection methods and show that it achieves better or comparable results. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

16 pages, 9121 KiB  
Technical Note
A Benchmark Dataset for Aircraft Detection in Optical Remote Sensing Imagery
by Jianming Hu, Xiyang Zhi, Bingxian Zhang, Tianjun Shi, Qi Cui and Xiaogang Sun
Remote Sens. 2024, 16(24), 4699; https://doi.org/10.3390/rs16244699 - 17 Dec 2024
Viewed by 584
Abstract
The problem is that existing aircraft detection datasets rarely simultaneously consider the diversity of target features and the complexity of environmental factors, which has become an important factor restricting the effectiveness and reliability of aircraft detection algorithms. Although a large amount of research [...] Read more.
The problem is that existing aircraft detection datasets rarely simultaneously consider the diversity of target features and the complexity of environmental factors, which has become an important factor restricting the effectiveness and reliability of aircraft detection algorithms. Although a large amount of research has been devoted to breaking through few-sample-driven aircraft detection technology, most algorithms still struggle to effectively solve the problems of missed target detection and false alarms caused by numerous environmental interferences in bird-eye optical remote sensing scenes. To further aircraft detection research, we have established a new dataset, Aircraft Detection in Complex Optical Scene (ADCOS), sourced from various platforms including Google Earth, Microsoft Map, Worldview-3, Pleiades, Ikonos, Orbview-3, and Jilin-1 satellites. It integrates 3903 meticulously chosen images of over 400 famous airports worldwide, containing 33,831 annotated instances employing the oriented bounding box (OBB) format. Notably, this dataset encompasses a wide range of various targets characteristics including multi-scale, multi-direction, multi-type, multi-state, and dense arrangement, along with complex relationships between targets and backgrounds like cluttered backgrounds, low contrast, shadows, and occlusion interference conditions. Furthermore, we evaluated nine representative detection algorithms on the ADCOS dataset, establishing a performance benchmark for subsequent algorithm optimization. The latest dataset will soon be available on the Github website. Full article
(This article belongs to the Section Earth Observation Data)
Show Figures

Figure 1

31 pages, 13252 KiB  
Article
GLCANet: Global–Local Context Aggregation Network for Cropland Segmentation from Multi-Source Remote Sensing Images
by Jinglin Zhang, Yuxia Li, Zhonggui Tong, Lei He, Mingheng Zhang, Zhenye Niu and Haiping He
Remote Sens. 2024, 16(24), 4627; https://doi.org/10.3390/rs16244627 - 10 Dec 2024
Cited by 1 | Viewed by 583
Abstract
Cropland is a fundamental basis for agricultural development and a prerequisite for ensuring food security. The segmentation and extraction of croplands using remote sensing images are important measures and prerequisites for detecting and protecting farmland. This study addresses the challenges of diverse image [...] Read more.
Cropland is a fundamental basis for agricultural development and a prerequisite for ensuring food security. The segmentation and extraction of croplands using remote sensing images are important measures and prerequisites for detecting and protecting farmland. This study addresses the challenges of diverse image sources, multi-scale representations of cropland, and the confusion of features between croplands and other land types in large-area remote sensing image information extraction. To this end, a multi-source self-annotated dataset was developed using satellite images from GaoFen-2, GaoFen-7, and WorldView, which was integrated with public datasets GID and LoveDA to create the CRMS dataset. A novel semantic segmentation network, the Global–Local Context Aggregation Network (GLCANet), was proposed. This method integrates the Bilateral Feature Encoder (BFE) of CNNs and Transformers with a global–local information mining module (GLM) to enhance global context extraction and improve cropland separability. It also employs a multi-scale progressive upsampling structure (MPUS) to refine the accuracy of diverse arable land representations from multi-source imagery. To tackle the issue of inconsistent features within the cropland class, a loss function based on hard sample mining and multi-scale features was constructed. The experimental results demonstrate that GLCANet improves OA and mIoU by 3.2% and 2.6%, respectively, compared to the existing advanced networks on the CRMS dataset. Additionally, the proposed method also demonstrated high precision and practicality in segmenting large-area croplands in Chongzhou City, Sichuan Province, China. Full article
Show Figures

Figure 1

19 pages, 7044 KiB  
Article
JointNet4BCD: A Semi-Supervised Joint Learning Neural Network with Decision Fusion for Building Change Detection
by Hao Chen, Chengzhe Sun, Jun Li and Chun Du
Remote Sens. 2024, 16(23), 4569; https://doi.org/10.3390/rs16234569 - 5 Dec 2024
Viewed by 738
Abstract
Remote sensing image building change detection aims to identify building changes that occur in remote sensing images of the same areas acquired at different times. In recent years, the development of deep learning has led to significant advancements in building change detection methods. [...] Read more.
Remote sensing image building change detection aims to identify building changes that occur in remote sensing images of the same areas acquired at different times. In recent years, the development of deep learning has led to significant advancements in building change detection methods. However, these fully supervised methods require a large number of bi-temporal remote sensing images with pixel-wise change detection labels to train the model, which incurs substantial time and manpower for annotation. To address this issue, this study proposes a novel single-temporal semi-supervised joint learning framework for building change detection, called JointNet4BCD. Firstly, to reduce annotation costs, we design a semi-supervised learning manner to train our model using a small number of building extraction labels instead of a large amount of building change detection labels. Furthermore, to improve the semantic understanding capability of the model, we propose a joint learning approach for building extraction and change detection tasks. Lastly, a decision fusion block is designed to fuse the building extraction results into the building change detection results to further improve the accuracy of building change detection. Experimental results on the two widely used datasets demonstrate that the proposed JointNet4BCD achieves excellent building change detection performance while reducing the need for labels from thousands to dozens. Using only ten labeled images, JointNet4BCD achieved F1-Scores of 83.93% and 83.45% on the LEVIR2000 and WHU datasets, respectively. Full article
Show Figures

Figure 1

23 pages, 10799 KiB  
Article
OMAD-6: Advancing Offshore Mariculture Monitoring with a Comprehensive Six-Type Dataset and Performance Benchmark
by Zewen Mo, Yinyu Liang, Yulin Chen, Yanyun Shen, Minduan Xu, Zhipan Wang and Qingling Zhang
Remote Sens. 2024, 16(23), 4522; https://doi.org/10.3390/rs16234522 - 2 Dec 2024
Viewed by 700
Abstract
Offshore mariculture is critical for global food security and economic development. Advances in deep learning and data-driven approaches, enable the rapid and effective monitoring of offshore mariculture distribution and changes. However, detector performance depends heavily on training data quality. The lack of standardized [...] Read more.
Offshore mariculture is critical for global food security and economic development. Advances in deep learning and data-driven approaches, enable the rapid and effective monitoring of offshore mariculture distribution and changes. However, detector performance depends heavily on training data quality. The lack of standardized classifications and public datasets for offshore mariculture facilities currently hampers effective monitoring. Here, we propose to categorize offshore mariculture facilities into six types: TCC, DWCC, FRC, LC, RC, and BC. Based on these categories, we introduce a benchmark dataset called OMAD-6. This dataset includes over 130,000 instances and more than 16,000 high-resolution remote sensing images. The images with a spatial resolution of 0.6 m were sourced from key regions in China, Chile, Norway, and Egypt, from the Google Earth platform. All instances in OMAD-6 were meticulously annotated manually with horizontal bounding boxes and polygons. Compared to existing remote sensing datasets, OMAD-6 has three notable characteristics: (1) it is comparable to large, published datasets in instances per category, image quantity, and sample coverage; (2) it exhibits high inter-class similarity; (3) it shows significant intra-class diversity in facility sizes and arrangements. Based on the OMAD-6 dataset, we evaluated eight state-of-the-art methods to establish baselines for future research. The experimental results demonstrate that the OMAD-6 dataset effectively represents various real-world scenarios, which have posed considerable challenges for current instance segmentation algorithms. Our evaluation confirms that the OMAD-6 dataset has the potential to improve offshore mariculture identification. Notably, the QueryInst and PointRend algorithms have distinguished themselves as top performers on the OMAD-6 dataset, robustly identifying offshore mariculture facilities even with complex environmental backgrounds. Its ongoing development and application will play a pivotal role in future offshore mariculture identification and management. Full article
Show Figures

Figure 1

19 pages, 4031 KiB  
Article
MSTrans: Multi-Scale Transformer for Building Extraction from HR Remote Sensing Images
by Fei Yang, Fenlong Jiang, Jianzhao Li and Lei Lu
Electronics 2024, 13(23), 4610; https://doi.org/10.3390/electronics13234610 - 22 Nov 2024
Viewed by 538
Abstract
Buildings are one of the most important goals of human transformation of the Earth’s surface. Therefore, building extraction (BE), such as in urban resource management and planning, is a task that is meaningful to actual production and life. Computational intelligence techniques based on [...] Read more.
Buildings are one of the most important goals of human transformation of the Earth’s surface. Therefore, building extraction (BE), such as in urban resource management and planning, is a task that is meaningful to actual production and life. Computational intelligence techniques based on convolutional neural networks (CNNs) and Transformers have begun to be of interest in BE, and have made some progress. However, the BE methods based on CNNs are limited by the difficulty in capturing global long-range relationships, while Transformer-based methods are often not detailed enough for pixel-level annotation tasks because they focus on global information. To conquer the limitations, a multi-scale Transformer (MSTrans) is proposed for BE from high-resolution remote sensing images. In the proposed MSTrans, we develop a plug-and-play multi-scale Transformer (MST) module based on atrous spatial pyramid pooling (ASPP). The MST module can effectively capture tokens of different scales through the Transformer encoder and Transformer decoder. This can enhance multi-scale feature extraction of buildings, thereby improving the BE performance. Experiments on three real and challenging BE datasets verify the effectiveness of the proposed MSTrans. While the proposed approach may not achieve the highest Precision and Recall accuracies compared with the seven benchmark methods, it improves the overall metrics F1 and mIoU by 0.4% and 1.67%, respectively. Full article
(This article belongs to the Special Issue Emerging Technologies in Computational Intelligence)
Show Figures

Figure 1

15 pages, 6433 KiB  
Technical Note
RSPS-SAM: A Remote Sensing Image Panoptic Segmentation Method Based on SAM
by Zhuoran Liu, Zizhen Li, Ying Liang, Claudio Persello, Bo Sun, Guangjun He and Lei Ma
Remote Sens. 2024, 16(21), 4002; https://doi.org/10.3390/rs16214002 - 28 Oct 2024
Cited by 1 | Viewed by 1329
Abstract
Satellite remote sensing images contain complex and diverse ground object information and the images exhibit spatial multi-scale characteristics, making the panoptic segmentation of satellite remote sensing images a highly challenging task. Due to the lack of large-scale annotated datasets for panoramic segmentation, existing [...] Read more.
Satellite remote sensing images contain complex and diverse ground object information and the images exhibit spatial multi-scale characteristics, making the panoptic segmentation of satellite remote sensing images a highly challenging task. Due to the lack of large-scale annotated datasets for panoramic segmentation, existing methods still suffer from weak model generalization capabilities. To mitigate this issue, this paper leverages the advantages of the Segment Anything Model (SAM), which can segment any object in remote sensing images without requiring any annotations and proposes a high-resolution remote sensing image panoptic segmentation method called Remote Sensing Panoptic Segmentation SAM (RSPS-SAM). Firstly, to address the problem of global information loss caused by cropping large remote sensing images for training, a Batch Attention Pyramid was designed to extract multi-scale features from remote sensing images and capture long-range contextual information between cropped patches, thereby enhancing the semantic understanding of remote sensing images. Secondly, we constructed a Mask Decoder to address the limitation of SAM requiring manual input prompts and its inability to output category information. This decoder utilized mask-based attention for mask segmentation, enabling automatic prompt generation and category prediction of segmented objects. Finally, the effectiveness of the proposed method was validated on the high-resolution remote sensing image airport scene dataset RSAPS-ASD. The results demonstrate that the proposed method achieves segmentation and recognition of foreground instances and background regions in high-resolution remote sensing images without the need for prompt input, while providing smooth segmentation boundaries with a panoptic segmentation quality (PQ) of 57.2, outperforming current mainstream methods. Full article
Show Figures

Graphical abstract

21 pages, 5465 KiB  
Article
Deep Learning Approaches for Wildfire Severity Prediction: A Comparative Study of Image Segmentation Networks and Visual Transformers on the EO4WildFires Dataset
by Dimitris Sykas, Dimitrios Zografakis and Konstantinos Demestichas
Fire 2024, 7(11), 374; https://doi.org/10.3390/fire7110374 - 23 Oct 2024
Viewed by 1799
Abstract
This paper investigates the applicability of deep learning models for predicting the severity of forest wildfires, utilizing an innovative benchmark dataset called EO4WildFires. EO4WildFires integrates multispectral imagery from Sentinel-2, SAR data from Sentinel-1, and meteorological data from NASA Power annotated with EFFIS data [...] Read more.
This paper investigates the applicability of deep learning models for predicting the severity of forest wildfires, utilizing an innovative benchmark dataset called EO4WildFires. EO4WildFires integrates multispectral imagery from Sentinel-2, SAR data from Sentinel-1, and meteorological data from NASA Power annotated with EFFIS data for forest fire detection and size estimation. These data cover 45 countries with a total of 31,730 wildfire events from 2018 to 2022. All of these various sources of data are archived into data cubes, with the intention of assessing wildfire severity by considering both current and historical forest conditions, utilizing a broad range of data including temperature, precipitation, and soil moisture. The experimental setup has been arranged to test the effectiveness of different deep learning architectures in predicting the size and shape of wildfire-burned areas. This study incorporates both image segmentation networks and visual transformers, employing a consistent experimental design across various models to ensure the comparability of the results. Adjustments were made to the training data, such as the exclusion of empty labels and very small events, to refine the focus on more significant wildfire events and potentially improve prediction accuracy. The models’ performance was evaluated using metrics like F1 score, IoU score, and Average Percentage Difference (aPD). These metrics offer a multi-faceted view of model performance, assessing aspects such as precision, sensitivity, and the accuracy of the burned area estimation. Through extensive testing the final model utilizing LinkNet and ResNet-34 as backbones, we obtained the following metric results on the test set: 0.86 F1 score, 0.75 IoU, and 70% aPD. These results were obtained when all of the available samples were used. When the empty labels were absent during the training and testing, the model increased its performance significantly: 0.87 F1 score, 0.77 IoU, and 44.8% aPD. This indicates that the number of samples, as well as their respectively size (area), tend to have an impact on the model’s robustness. This restriction is well known in the remote sensing domain, as accessible, accurately labeled data may be limited. Visual transformers like TeleViT showed potential but underperformed compared to segmentation networks in terms of F1 and IoU scores. Full article
Show Figures

Figure 1

15 pages, 2289 KiB  
Technical Note
Detection of Complex Formations in an Inland Lake from Sentinel-2 Images Using Atmospheric Corrections and a Fully Connected Deep Neural Network
by Damianos F. Mantsis, Anastasia Moumtzidou, Ioannis Lioumbas, Ilias Gialampoukidis, Aikaterini Christodoulou, Alexandros Mentes, Stefanos Vrochidis and Ioannis Kompatsiaris
Remote Sens. 2024, 16(20), 3913; https://doi.org/10.3390/rs16203913 - 21 Oct 2024
Viewed by 790
Abstract
The detection of complex formations, initially suspected to be oil spills, is investigated using atmospherically corrected multispectral satellite images and deep learning techniques. Several formations have been detected in an inland lake in Northern Greece. Four atmospheric corrections (ACOLITE, iCOR, Polymer, and C2RCC) [...] Read more.
The detection of complex formations, initially suspected to be oil spills, is investigated using atmospherically corrected multispectral satellite images and deep learning techniques. Several formations have been detected in an inland lake in Northern Greece. Four atmospheric corrections (ACOLITE, iCOR, Polymer, and C2RCC) that are specifically designed for water applications are examined and implemented on Sentinel-2 multispectral satellite images to eliminate the influence of the atmosphere. Out of the four algorithms, iCOR and ACOLITE are able to depict the formations sufficiently; however, the latter is chosen for further processing due to fewer uncertainties in the depiction of these formations as anomalies across the multispectral range. Furthermore, a number of formations are annotated at the pixel level for the 10 m bands (red, green, blue, and NIR), and a deep neural network (DNN) is trained and validated. Our results show that the four-band configuration provides the best model for the detection of these complex formations. Despite not being necessarily related to oil spills, studying these formations is crucial for environmental monitoring, pollution detection, and the advancement of remote sensing techniques. Full article
(This article belongs to the Section Ocean Remote Sensing)
Show Figures

Figure 1

20 pages, 24465 KiB  
Article
Unsupervised Multi-Scale Hybrid Feature Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images
by Wanying Song, Fangxin Nie, Chi Wang, Yinyin Jiang and Yan Wu
Remote Sens. 2024, 16(20), 3774; https://doi.org/10.3390/rs16203774 - 11 Oct 2024
Viewed by 1452
Abstract
Generating pixel-level annotations for semantic segmentation tasks of high-resolution remote sensing images is both time-consuming and labor-intensive, which has led to increased interest in unsupervised methods. Therefore, in this paper, we propose an unsupervised multi-scale hybrid feature extraction network based on the CNN-Transformer [...] Read more.
Generating pixel-level annotations for semantic segmentation tasks of high-resolution remote sensing images is both time-consuming and labor-intensive, which has led to increased interest in unsupervised methods. Therefore, in this paper, we propose an unsupervised multi-scale hybrid feature extraction network based on the CNN-Transformer architecture, referred to as MSHFE-Net. The MSHFE-Net consists of three main modules: a Multi-Scale Pixel-Guided CNN Encoder, a Multi-Scale Aggregation Transformer Encoder, and a Parallel Attention Fusion Module. The Multi-Scale Pixel-Guided CNN Encoder is designed for multi-scale, fine-grained feature extraction in unsupervised tasks, efficiently recovering local spatial information in images. Meanwhile, the Multi-Scale Aggregation Transformer Encoder introduces a multi-scale aggregation module, which further enhances the unsupervised acquisition of multi-scale contextual information, obtaining global features with stronger feature representation. The Parallel Attention Fusion Module employs an attention mechanism to fuse global and local features in both channel and spatial dimensions in parallel, enriching the semantic relations extracted during unsupervised training and improving the performance of unsupervised semantic segmentation. K-means clustering is then performed on the fused features to achieve high-precision unsupervised semantic segmentation. Experiments with MSHFE-Net on the Potsdam and Vaihingen datasets demonstrate its effectiveness in significantly improving the accuracy of unsupervised semantic segmentation. Full article
Show Figures

Figure 1

24 pages, 10093 KiB  
Article
Enhancing a You Only Look Once-Plated Detector via Auxiliary Textual Coding for Multi-Scale Rotating Remote Sensing Objects in Transportation Monitoring Applications
by Sarentuya Bao, Mingwang Zhang, Rui Xie, Dabhvrbayar Huang and Jianlei Kong
Appl. Sci. 2024, 14(19), 9074; https://doi.org/10.3390/app14199074 - 8 Oct 2024
Cited by 1 | Viewed by 1042
Abstract
With the rapid development of intelligent information technologies, remote sensing object detection has played an important role in different field applications. Particularly in recent years, it has attracted widespread attention in assisting with food safety supervision, which still faces troubling issues between oversized [...] Read more.
With the rapid development of intelligent information technologies, remote sensing object detection has played an important role in different field applications. Particularly in recent years, it has attracted widespread attention in assisting with food safety supervision, which still faces troubling issues between oversized parameters and low performance that are challenging to solve. Hence, this article proposes a novel remote sensing detection framework for multi-scale objects with a rotating status and mutual occlusion, defined as EYMR-Net. This proposed approach is established on the YOLO-v7 architecture with a Swin Transformer backbone, which offers multi-scale receptive fields to mine massive features. Then, an enhanced attention module is added to exploit the spatial and dimensional interrelationships among different local characteristics. Subsequently, the effective rotating frame regression mechanism via circular smoothing labels is introduced to the EYMR-Net structure, addressing the problem of horizontal YOLO (You Only Look Once) frames ignoring direction changes. Extensive experiments on DOTA datasets demonstrated the outstanding performance of EYMR-Net, which achieved an impressive mAP0.5 of up to 74.3%. Further ablation experiments verified that our proposed approach obtains a balance between performance and efficiency, which is beneficial for practical remote sensing applications in transportation monitoring and supply chain management. Full article
(This article belongs to the Special Issue Deep Learning in Satellite Remote Sensing Applications)
Show Figures

Figure 1

19 pages, 9016 KiB  
Article
Semi-Supervised Subcategory Centroid Alignment-Based Scene Classification for High-Resolution Remote Sensing Images
by Nan Mo and Ruixi Zhu
Remote Sens. 2024, 16(19), 3728; https://doi.org/10.3390/rs16193728 - 7 Oct 2024
Cited by 1 | Viewed by 900
Abstract
It is usually hard to obtain adequate annotated data for delivering satisfactory scene classification results. Semi-supervised scene classification approaches can transfer the knowledge learned from previously annotated data to remote sensing images with scarce samples for satisfactory classification results. However, due to the [...] Read more.
It is usually hard to obtain adequate annotated data for delivering satisfactory scene classification results. Semi-supervised scene classification approaches can transfer the knowledge learned from previously annotated data to remote sensing images with scarce samples for satisfactory classification results. However, due to the differences between sensors, environments, seasons, and geographical locations, cross-domain remote sensing images exhibit feature distribution deviations. Therefore, semi-supervised scene classification methods may not achieve satisfactory classification accuracy. To address this problem, a novel semi-supervised subcategory centroid alignment (SSCA)-based scene classification approach is proposed. The SSCA framework is made up of two components, namely the rotation-robust convolutional feature extractor (RCFE) and the neighbor-based subcategory centroid alignment (NSCA). The RCFE aims to suppress the impact of rotation changes on remote sensing image representation, while the NSCA aims to decrease the impact of intra-category variety across domains on cross-domain scene classification. The SSCA algorithm and several competitive approaches are validated on two datasets to demonstrate its effectiveness. The results prove that the proposed SSCA approach performs better than most competitive approaches by no less than 2% overall accuracy. Full article
(This article belongs to the Special Issue Deep Transfer Learning for Remote Sensing II)
Show Figures

Figure 1

18 pages, 16454 KiB  
Technical Note
Annotated Dataset for Training Cloud Segmentation Neural Networks Using High-Resolution Satellite Remote Sensing Imagery
by Mingyuan He, Jie Zhang, Yang He, Xinjie Zuo and Zebin Gao
Remote Sens. 2024, 16(19), 3682; https://doi.org/10.3390/rs16193682 - 2 Oct 2024
Cited by 1 | Viewed by 1387
Abstract
The integration of satellite data with deep learning has revolutionized various tasks in remote sensing, including classification, object detection, and semantic segmentation. Cloud segmentation in high-resolution satellite imagery is a critical application within this domain, yet progress in developing advanced algorithms for this [...] Read more.
The integration of satellite data with deep learning has revolutionized various tasks in remote sensing, including classification, object detection, and semantic segmentation. Cloud segmentation in high-resolution satellite imagery is a critical application within this domain, yet progress in developing advanced algorithms for this task has been hindered by the scarcity of specialized datasets and annotation tools. This study addresses this challenge by introducing CloudLabel, a semi-automatic annotation technique leveraging region growing and morphological algorithms including flood fill, connected components, and guided filter. CloudLabel v1.0 streamlines the annotation process for high-resolution satellite images, thereby addressing the limitations of existing annotation platforms which are not specifically adapted to cloud segmentation, and enabling the efficient creation of high-quality cloud segmentation datasets. Notably, we have curated the Annotated Dataset for Training Cloud Segmentation (ADTCS) comprising 32,065 images (512 × 512) for cloud segmentation based on CloudLabel. The ADTCS dataset facilitates algorithmic advancement in cloud segmentation, characterized by uniform cloud coverage distribution and high image entropy (mainly 5–7). These features enable deep learning models to capture comprehensive cloud characteristics, enhancing recognition accuracy and reducing ground object misclassification. This contribution significantly advances remote sensing applications and cloud segmentation algorithms. Full article
Show Figures

Figure 1

Back to TopTop