Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (40)

Search Parameters:
Keywords = neighborhood attention transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 710 KiB  
Article
Efficient and Effective Unsupervised Entity Alignment in Large Knowledge Graphs
by Weishan Cai, Ruqi Zhou and Wenjun Ma
Appl. Sci. 2025, 15(4), 1976; https://doi.org/10.3390/app15041976 - 13 Feb 2025
Abstract
Entity Alignment (EA) in Knowledge Graphs (KGs) is a crucial task for the integration of multiple KGs, facilitating the amalgamation of multi-source knowledge and enhancing support for downstream applications. In recent years, unsupervised EA methods have demonstrated remarkable efficacy in leveraging graph structures [...] Read more.
Entity Alignment (EA) in Knowledge Graphs (KGs) is a crucial task for the integration of multiple KGs, facilitating the amalgamation of multi-source knowledge and enhancing support for downstream applications. In recent years, unsupervised EA methods have demonstrated remarkable efficacy in leveraging graph structures or utilizing auxiliary information. However, the increasing complexity of many modeling methods limits their applicability to large KGs in real-world scenarios. Given that most EA encoders primarily focus on modeling one-hop neighborhoods within the KG’s graph structure while neglecting similarities among multi-hop neighborhoods, we propose an efficient and effective unsupervised EA method, MPGT-Align, based on a multi-hop pruning graph transformer. The core innovation of MPGT-Align lies in mining multi-hop neighborhood features of entities through two components: Pruning-hop2Token and Attention-based Transformer encoder. The former aggregates only those multi-hop neighborhoods that contribute to alignment targets, inspired by search pruning algorithms. The latter empowers MPGT-Align to adaptively extract more effective alignment information from both entity itself and its multi-hop neighbors. Furthermore, Pruning-hop2Token serves as a non-parametric method that not only reduces model parameters, but also allows MPGT-Align to be trained with small batch sizes, thereby enabling efficient handling of large KGs. Extensive experiments conducted across various benchmark datasets demonstrate that our method consistently outperforms most existing supervised and unsupervised EA techniques. Full article
(This article belongs to the Special Issue Knowledge Graphs: State-of-the-Art and Applications)
Show Figures

Figure 1

26 pages, 107737 KiB  
Article
Optimizing Public Spaces for Age-Friendly Living: Renovation Strategies for 1980s Residential Communities in Hangzhou, China
by Min Gong, Ning Wang, Yubei Chu, Yiyao Wu, Jiadi Huang and Jing Wu
Buildings 2025, 15(2), 211; https://doi.org/10.3390/buildings15020211 - 12 Jan 2025
Viewed by 546
Abstract
Population aging and urbanization are two of the most significant social transformations of the 21st century. Against the backdrop of rapid aging in China, developing age-friendly community environments, particularly through the renovation of legacy residential communities, not only supports active and healthy aging [...] Read more.
Population aging and urbanization are two of the most significant social transformations of the 21st century. Against the backdrop of rapid aging in China, developing age-friendly community environments, particularly through the renovation of legacy residential communities, not only supports active and healthy aging but also promotes equity and sustainable development. This study focuses on residential communities built in the 1980s in Hangzhou, exploring strategies for the age-friendly renovation of outdoor public spaces. The residential communities that flourished during the construction boom of the 1980s are now confronting a dual challenge: aging populations and deteriorating facilities. However, existing renovation efforts often pay insufficient attention to the comprehensive age-friendly transformation of outdoor public spaces within these neighborhoods. Following a structured research framework encompassing investigation, evaluation, design, and discussion, this study first analyzes linear grid layouts and usage patterns of these communities. Then, the research team uses post-occupancy evaluation (POE) to assess the age-friendliness of outdoor public spaces. Semi-structured interviews with elderly residents identify key concerns and establish a preliminary evaluation framework, while a Likert-scale questionnaire quantifies the satisfaction with age-friendly features across four communities. The assessment reveals that key age-friendliness issues, including poor traffic safety, dispersed activity spaces, and insufficiently adapted facilities, are closely linked to the linear usage patterns within the spatial framework of the grid layouts. Based on the findings, the study develops tiered renovation goals, renovation principles and implemented an age-friendly design in the Hemu Community. The strengths, weaknesses, and feasibility of the renovation plan are discussed, while three recommendations are made to ensure successful implementation. The study is intended to provide a valuable reference for advancing age-friendly residential renewal efforts in Hangzhou and contributing to the broader objective of sustainable, inclusive city development. Full article
(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)
Show Figures

Figure 1

27 pages, 9095 KiB  
Article
BMFusion: Bridging the Gap Between Dark and Bright in Infrared-Visible Imaging Fusion
by Chengwen Liu, Bin Liao and Zhuoyue Chang
Electronics 2024, 13(24), 5005; https://doi.org/10.3390/electronics13245005 - 19 Dec 2024
Viewed by 609
Abstract
The fusion of infrared and visible light images is a crucial technology for enhancing visual perception in complex environments. It plays a pivotal role in improving visual perception and subsequent performance in advanced visual tasks. However, due to the significant degradation of visible [...] Read more.
The fusion of infrared and visible light images is a crucial technology for enhancing visual perception in complex environments. It plays a pivotal role in improving visual perception and subsequent performance in advanced visual tasks. However, due to the significant degradation of visible light image quality in low-light or nighttime scenes, most existing fusion methods often struggle to obtain sufficient texture details and salient features when processing such scenes. This can lead to a decrease in fusion quality. To address this issue, this article proposes a new image fusion method called BMFusion. Its aim is to significantly improve the quality of fused images in low-light or nighttime scenes and generate high-quality fused images around the clock. This article first designs a brightness attention module composed of brightness attention units. It extracts multimodal features by combining the SimAm attention mechanism with a Transformer architecture. Effective enhancement of brightness and features has been achieved, with gradual brightness attention performed during feature extraction. Secondly, a complementary fusion module was designed. This module deeply fuses infrared and visible light features to ensure the complementarity and enhancement of each modal feature during the fusion process, minimizing information loss to the greatest extent possible. In addition, a feature reconstruction network combining CLIP-guided semantic vectors and neighborhood attention enhancement was proposed in the feature reconstruction stage. It uses the KAN module to perform channel adaptive optimization on the reconstruction process, ensuring semantic consistency and detail integrity of the fused image during the reconstruction phase. The experimental results on a large number of public datasets demonstrate that the BMFusion method can generate fusion images with higher visual quality and richer details in night and low-light environments compared with various existing state-of-the-art (SOTA) algorithms. At the same time, the fusion image can significantly improve the performance of advanced visual tasks. This shows the great potential and application prospect of this method in the field of multimodal image fusion. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

18 pages, 2648 KiB  
Article
Semantic Scene Completion in Autonomous Driving: A Two-Stream Multi-Vehicle Collaboration Approach
by Junxuan Li, Yuanfang Zhang, Jiayi Han, Peng Han and Kaiqing Luo
Sensors 2024, 24(23), 7702; https://doi.org/10.3390/s24237702 - 2 Dec 2024
Viewed by 868
Abstract
Vehicle-to-vehicle communication enables capturing sensor information from diverse perspectives, greatly aiding in semantic scene completion in autonomous driving. However, the misalignment of features between ego vehicle and cooperative vehicles leads to ambiguity problems, affecting accuracy and semantic information. In this paper, we propose [...] Read more.
Vehicle-to-vehicle communication enables capturing sensor information from diverse perspectives, greatly aiding in semantic scene completion in autonomous driving. However, the misalignment of features between ego vehicle and cooperative vehicles leads to ambiguity problems, affecting accuracy and semantic information. In this paper, we propose a Two-Stream Multi-Vehicle collaboration approach (TSMV), which divides the features of collaborative vehicles into two streams and regresses interactively. To overcome the problems caused by feature misalignment, the Neighborhood Self-Cross Attention Transformer (NSCAT) module is designed to enable the ego vehicle to query the most similar local features from collaborative vehicles through cross-attention, rather than assuming spatial-temporal synchronization. A 3D occupancy map is finally generated from the features of collaborative vehicle aggregation. Experimental results on both V2VSSC and SemanticOPV2V datasets demonstrate TSMV outpace state-of-the-art collaborative semantic scene completion techniques. Full article
(This article belongs to the Special Issue Intelligent Sensing and Computing for Smart and Autonomous Vehicles)
Show Figures

Figure 1

29 pages, 4029 KiB  
Article
Compact DINO-ViT: Feature Reduction for Visual Transformer
by Didih Rizki Chandranegara, Przemysław Niedziela and Bogusław Cyganek
Electronics 2024, 13(23), 4694; https://doi.org/10.3390/electronics13234694 - 27 Nov 2024
Viewed by 569
Abstract
Research has been ongoing for years to discover image features that enable their best classification. One of the latest developments in this area is the Self-Distillation with No Labels Vision Transformer—DINO-ViT features. However, even for a single image, their volume is significant. Therefore, [...] Read more.
Research has been ongoing for years to discover image features that enable their best classification. One of the latest developments in this area is the Self-Distillation with No Labels Vision Transformer—DINO-ViT features. However, even for a single image, their volume is significant. Therefore, for this article we proposed to substantially reduce their size, using two methods: Principal Component Analysis and Neighborhood Component Analysis. Our developed methods, PCA-DINO and NCA-DINO, showed a significant reduction in the volume of the features, often exceeding an order of magnitude while maintaining or slightly reducing the classification accuracy, which was confirmed by numerous experiments. Additionally, we evaluated the Uniform Manifold Approximation and Projection (UMAP) method, showing the superiority of the PCA and NCA approaches. Our experiments involving modifications to patch size, attention heads, and noise insertion in DINO-ViT demonstrated that both PCA-DINO and NCA-DINO exhibited reliable accuracy. While NCA-DINO is optimal for high-performance applications despite its higher computational cost, PCA-DINO offers a faster, more resource-efficient solution, depending on the application-specific requirements. The code for our method is available on GitHub. Full article
Show Figures

Figure 1

16 pages, 9223 KiB  
Article
NATCA YOLO-Based Small Object Detection for Aerial Images
by Yicheng Zhu, Zhenhua Ai, Jinqiang Yan, Silong Li, Guowei Yang and Teng Yu
Information 2024, 15(7), 414; https://doi.org/10.3390/info15070414 - 18 Jul 2024
Viewed by 1736
Abstract
The object detection model in UAV aerial image scenes faces challenges such as significant scale changes of certain objects and the presence of complex backgrounds. This paper aims to address the detection of small objects in aerial images using NATCA (neighborhood attention Transformer [...] Read more.
The object detection model in UAV aerial image scenes faces challenges such as significant scale changes of certain objects and the presence of complex backgrounds. This paper aims to address the detection of small objects in aerial images using NATCA (neighborhood attention Transformer coordinate attention) YOLO. Specifically, the feature extraction network incorporates a neighborhood attention transformer (NAT) into the last layer to capture global context information and extract diverse features. Additionally, the feature fusion network (Neck) incorporates a coordinate attention (CA) module to capture channel information and longer-range positional information. Furthermore, the activation function in the original convolutional block is replaced with Meta-ACON. The NAT serves as the prediction layer in the new network, which is evaluated using the VisDrone2019-DET object detection dataset as a benchmark, and tested on the VisDrone2019-DET-test-dev dataset. To assess the performance of the NATCA YOLO model in detecting small objects in aerial images, other detection networks, such as Faster R-CNN, RetinaNet, and SSD, are employed for comparison on the test set. The results demonstrate that the NATCA YOLO detection achieves an average accuracy of 42%, which is a 2.9% improvement compared to the state-of-the-art detection network TPH-YOLOv5. Full article
Show Figures

Figure 1

20 pages, 17657 KiB  
Article
DiT-Gesture: A Speech-Only Approach to Stylized Gesture Generation
by Fan Zhang, Zhaohan Wang, Xin Lyu, Naye Ji, Siyuan Zhao and Fuxing Gao
Electronics 2024, 13(9), 1702; https://doi.org/10.3390/electronics13091702 - 27 Apr 2024
Viewed by 1664
Abstract
The generation of co-speech gestures for digital humans is an emerging area in the field of virtual human creation. Prior research has progressed by using acoustic and semantic information as input and adopting a classification method to identify the person’s ID and emotion [...] Read more.
The generation of co-speech gestures for digital humans is an emerging area in the field of virtual human creation. Prior research has progressed by using acoustic and semantic information as input and adopting a classification method to identify the person’s ID and emotion for driving co-speech gesture generation. However, this endeavor still faces significant challenges. These challenges go beyond the intricate interplay among co-speech gestures, speech acoustic, and semantics; they also encompass the complexities associated with personality, emotion, and other obscure but important factors. This paper introduces “DiT-Gestures”, a speech-conditional diffusion-based and non-autoregressive transformer-based generative model with the WavLM pre-trained model and a dynamic mask attention network (DMAN). It can produce individual and stylized full-body co-speech gestures by only using raw speech audio, eliminating the need for complex multimodal processing and manual annotation. Firstly, considering that speech audio contains acoustic and semantic features and conveys personality traits, emotions, and more subtle information related to accompanying gestures, we pioneer the adaptation of WavLM, a large-scale pre-trained model, to extract the style from raw audio information. Secondly, we replace the causal mask by introducing a learnable dynamic mask for better local modeling in the neighborhood of the target frames. Extensive subjective evaluation experiments are conducted on the Trinity, ZEGGS, and BEAT datasets to confirm WavLM’s and the model’s ability to synthesize natural co-speech gestures with various styles. Full article
Show Figures

Figure 1

21 pages, 637 KiB  
Article
Function-Level Compilation Provenance Identification with Multi-Faceted Neural Feature Distillation and Fusion
by Yang Gao, Lunjin Liang, Yifei Li, Rui Li and Yu Wang
Electronics 2024, 13(9), 1692; https://doi.org/10.3390/electronics13091692 - 27 Apr 2024
Viewed by 1034
Abstract
In the landscape of software development, the selection of compilation tools and settings plays a pivotal role in the creation of executable binaries. This diversity, while beneficial, introduces significant challenges for reverse engineers and security analysts in deciphering the compilation provenance of binary [...] Read more.
In the landscape of software development, the selection of compilation tools and settings plays a pivotal role in the creation of executable binaries. This diversity, while beneficial, introduces significant challenges for reverse engineers and security analysts in deciphering the compilation provenance of binary code. To this end, we present MulCPI, short for Multi-representation Fusion-based Compilation Provenance Identification, which integrates the features collected from multiple distinct intermediate representations of the binary code for better discernment of the fine-grained function-level compilation details. In particular, we devise a novel graph-oriented neural encoder improved upon the gated graph neural network by subtly introducing an attention mechanism into the neighborhood nodes’ information aggregation computation, in order to better distill the more informative features from the attributed control flow graph. By further integrating the features collected from the normalized assembly sequence with an advanced Transformer encoder, MulCPI is capable of capturing a more comprehensive set of features manifesting the multi-faceted lexical, syntactic, and structural insights of the binary code. Extensive evaluation on a public dataset comprising 854,858 unique functions demonstrates that MulCPI exceeds the performance of current leading methods in identifying the compiler family, optimization level, compiler version, and the combination of compilation settings. It achieves average accuracy rates of 99.3%, 96.4%, 90.7%, and 85.3% on these tasks, respectively. Additionally, an ablation study highlights the significance of MulCPI’s core designs, validating the efficiency of the proposed attention-enhanced gated graph neural network encoder and the advantages of incorporating multiple code representations. Full article
(This article belongs to the Special Issue Machine Learning (ML) and Software Engineering, 2nd Edition)
Show Figures

Figure 1

18 pages, 1892 KiB  
Article
Research on Efficient Feature Generation and Spatial Aggregation for Remote Sensing Semantic Segmentation
by Ruoyang Li, Shuping Xiong, Yinchao Che, Lei Shi, Xinming Ma and Lei Xi
Algorithms 2024, 17(4), 151; https://doi.org/10.3390/a17040151 - 4 Apr 2024
Viewed by 1757
Abstract
Semantic segmentation algorithms leveraging deep convolutional neural networks often encounter challenges due to their extensive parameters, high computational complexity, and slow execution. To address these issues, we introduce a semantic segmentation network model emphasizing the rapid generation of redundant features and multi-level spatial [...] Read more.
Semantic segmentation algorithms leveraging deep convolutional neural networks often encounter challenges due to their extensive parameters, high computational complexity, and slow execution. To address these issues, we introduce a semantic segmentation network model emphasizing the rapid generation of redundant features and multi-level spatial aggregation. This model applies cost-efficient linear transformations instead of standard convolution operations during feature map generation, effectively managing memory usage and reducing computational complexity. To enhance the feature maps’ representation ability post-linear transformation, a specifically designed dual-attention mechanism is implemented, enhancing the model’s capacity for semantic understanding of both local and global image information. Moreover, the model integrates sparse self-attention with multi-scale contextual strategies, effectively combining features across different scales and spatial extents. This approach optimizes computational efficiency and retains crucial information, enabling precise and quick image segmentation. To assess the model’s segmentation performance, we conducted experiments in Changge City, Henan Province, using datasets such as LoveDA, PASCAL VOC, LandCoverNet, and DroneDeploy. These experiments demonstrated the model’s outstanding performance on public remote sensing datasets, significantly reducing the parameter count and computational complexity while maintaining high accuracy in segmentation tasks. This advancement offers substantial technical benefits for applications in agriculture and forestry, including land cover classification and crop health monitoring, thereby underscoring the model’s potential to support these critical sectors effectively. Full article
(This article belongs to the Special Issue Algorithms in Data Classification (2nd Edition))
Show Figures

Graphical abstract

15 pages, 3538 KiB  
Article
Multi-Wind Turbine Wind Speed Prediction Based on Weighted Diffusion Graph Convolution and Gated Attention Network
by Yakai Qiao, Hui Chen and Bo Fu
Energies 2024, 17(7), 1658; https://doi.org/10.3390/en17071658 - 30 Mar 2024
Cited by 2 | Viewed by 1203
Abstract
The complex environmental impact makes it difficult to predict wind speed with high precision for multiple wind turbines. Most existing research methods model the temporal dependence of wind speeds, ignoring the spatial correlation between wind turbines. In this paper, we propose a multi-wind [...] Read more.
The complex environmental impact makes it difficult to predict wind speed with high precision for multiple wind turbines. Most existing research methods model the temporal dependence of wind speeds, ignoring the spatial correlation between wind turbines. In this paper, we propose a multi-wind turbine wind speed prediction model based on Weighted Diffusion Graph Convolution and Gated Attention Network (WDGCGAN). To address the strong nonlinear correlation problem among multiple wind turbines, we use the maximal information coefficient (MIC) method to calculate the correlation weights between wind turbines and construct a weighted graph for multiple wind turbines. Next, by applying Diffusion Graph Convolution (DGC) transformation to the weight matrix of the weighted graph, we obtain the spatial graph diffusion matrix of the wind farm to aggregate the high-order neighborhood information of the graph nodes. Finally, by combining the DGC with the gated attention recurrent unit (GAU), we establish a spatio-temporal model for multi-turbine wind speed prediction. Experiments on the wind farm data in Massachusetts show that the proposed method can effectively aggregate the spatio-temporal information of wind turbine nodes and improve the prediction accuracy of multiple wind speeds. In the 1h prediction task, the average RMSE of the proposed model is 28% and 33.1% lower than that of the Long Short-Term Memory Network (LSTM) and Convolutional Neural Network (CNN), respectively. Full article
(This article belongs to the Topic Solar and Wind Power and Energy Forecasting)
Show Figures

Figure 1

13 pages, 3774 KiB  
Article
Nested Contrastive Boundary Learning: Point Transformer Self-Attention Regularization for 3D Intracranial Aneurysm Segmentation
by Luis Felipe Estrella-Ibarra, Alejandro de León-Cuevas and Saul Tovar-Arriaga
Technologies 2024, 12(3), 28; https://doi.org/10.3390/technologies12030028 - 21 Feb 2024
Cited by 2 | Viewed by 2340
Abstract
In 3D segmentation, point-based models excel but face difficulties in precise class delineation at class intersections, an inherent challenge in segmentation models. This is particularly critical in medical applications, influencing patient care and surgical planning, where accurate 3D boundary identification is essential for [...] Read more.
In 3D segmentation, point-based models excel but face difficulties in precise class delineation at class intersections, an inherent challenge in segmentation models. This is particularly critical in medical applications, influencing patient care and surgical planning, where accurate 3D boundary identification is essential for assisting surgery and enhancing medical training through advanced simulations. This study introduces the Nested Contrastive Boundary Learning Point Transformer (NCBL-PT), specially designed for 3D point cloud segmentation. NCBL-PT employs contrastive learning to improve boundary point representation by enhancing feature similarity within the same class. NCBL-PT incorporates a border-aware distinction within the same class points, allowing the model to distinctly learn from both points in proximity to the class intersection and from those beyond. This reduces semantic confusion among the points of different classes in the ambiguous class intersection zone, where similarity in features due to proximity could lead to incorrect associations. The model operates within subsampled point clouds at each encoder block stage of the point transformer architecture. It applies self-attention with k = 16 nearest neighbors to local neighborhoods, aligning with NCBL calculations for consistent self-attention regularization in local contexts. NCBL-PT improves 3D segmentation at class intersections, as evidenced by a 3.31% increase in Intersection over Union (IOU) for aneurysm segmentation compared to the base point transformer model. Full article
(This article belongs to the Section Assistive Technologies)
Show Figures

Figure 1

29 pages, 21933 KiB  
Article
Enhancing Hyperspectral Anomaly Detection with a Novel Differential Network Approach for Precision and Robust Background Suppression
by Jiajia Zhang, Pei Xiang, Xiang Teng, Dong Zhao, Huan Li, Jiangluqi Song, Huixin Zhou and Wei Tan
Remote Sens. 2024, 16(3), 434; https://doi.org/10.3390/rs16030434 - 23 Jan 2024
Cited by 2 | Viewed by 1771
Abstract
The existing deep-learning-based hyperspectral anomaly detection methods detect anomalies by reconstructing a clean background. However, these methods model the background of the hyperspectral image (HSI) through global features, neglecting local features. In complex background scenarios, these methods struggle to obtain accurate background priors [...] Read more.
The existing deep-learning-based hyperspectral anomaly detection methods detect anomalies by reconstructing a clean background. However, these methods model the background of the hyperspectral image (HSI) through global features, neglecting local features. In complex background scenarios, these methods struggle to obtain accurate background priors for training constraints, thereby limiting the anomaly detection performance. To enhance the capability of the network in extracting local features and improve anomaly detection performance, a hyperspectral anomaly detection method based on differential network is proposed. First, we posit that anomalous pixels are challenging to be reconstructed through the features of surrounding pixels. A differential convolution method is introduced to extract local punctured neighborhood features in the HSI. The differential convolution contains two types of kernels with different receptive fields. These kernels are adopted to obtain the outer window features and inner window features. Second, to improve the feature extraction capability of the network, a local detail attention and a local Transformer attention are proposed. These attention modules enhance the inner window features. Third, the obtained inner window features are subtracted from the outer window features to derive differential features, which encapsulate local punctured neighborhood characteristics. The obtained differential features are employed to reconstruct the background of the HSI. Finally, the anomaly detection results are extracted from the difference between the input HSI and the reconstructed background of the HSI. In the proposed method, for each receptive field kernel, the optimization objective is to reconstruct the input HSI rather than the background HSI. This way circumvents problems where the background constraint biases might affect detection performance. The proposed method offers researchers a new and effective approach for applying deep learning in a local area to the field of hyperspectral anomaly detection. The experiments are conducted with multiple metrics on five real-world datasets. The proposed method outperforms eight state-of-the-art methods in both subjective and objective evaluations. Full article
(This article belongs to the Special Issue Remote Sensing for Geology and Mapping)
Show Figures

Figure 1

27 pages, 5536 KiB  
Article
Multi-Modal Contrastive Learning for LiDAR Point Cloud Rail-Obstacle Detection in Complex Weather
by Lu Wen, Yongliang Peng, Miao Lin, Nan Gan and Rongqing Tan
Electronics 2024, 13(1), 220; https://doi.org/10.3390/electronics13010220 - 3 Jan 2024
Cited by 6 | Viewed by 2690
Abstract
Obstacle intrusion is a serious threat to the safety of railway traffic. LiDAR point cloud 3D semantic segmentation (3DSS) provides a new method for unmanned rail-obstacle detection. However, the inevitable degradation of model performance occurs in complex weather and hinders its practical application. [...] Read more.
Obstacle intrusion is a serious threat to the safety of railway traffic. LiDAR point cloud 3D semantic segmentation (3DSS) provides a new method for unmanned rail-obstacle detection. However, the inevitable degradation of model performance occurs in complex weather and hinders its practical application. In this paper, a multi-modal contrastive learning (CL) strategy, named DHT-CL, is proposed to improve point cloud 3DSS in complex weather for rail-obstacle detection. DHT-CL is a camera and LiDAR sensor fusion strategy specifically designed for complex weather and obstacle detection tasks, without the need for image input during the inference stage. We first demonstrate how the sensor fusion method is more robust under rainy and snowy conditions, and then we design a Dual-Helix Transformer (DHT) to extract deeper cross-modal information through a neighborhood attention mechanism. Then, an obstacle anomaly-aware cross-modal discrimination loss is constructed for collaborative optimization that adapts to the anomaly identification task. Experimental results on a complex weather railway dataset show that with an mIoU of 87.38%, the proposed DHT-CL strategy achieves better performance compared to other high-performance models from the autonomous driving dataset, SemanticKITTI. The qualitative results show that DHT-CL achieves higher accuracy in clear weather and reduces false alarms in rainy and snowy weather. Full article
(This article belongs to the Special Issue Advanced Technologies in Intelligent Transportation Systems)
Show Figures

Figure 1

17 pages, 1475 KiB  
Article
Change Detection Needs Neighborhood Interaction in Transformer
by Hangling Ma, Lingran Zhao, Bingquan Li, Ruiqing Niu and Yueyue Wang
Remote Sens. 2023, 15(23), 5459; https://doi.org/10.3390/rs15235459 - 22 Nov 2023
Cited by 4 | Viewed by 1482
Abstract
Remote sensing image change detection (CD) is an essential technique for analyzing surface changes from co-registered images of different time periods. The main challenge in CD is to identify the alterations that the user intends to emphasize, while excluding pseudo-changes caused by external [...] Read more.
Remote sensing image change detection (CD) is an essential technique for analyzing surface changes from co-registered images of different time periods. The main challenge in CD is to identify the alterations that the user intends to emphasize, while excluding pseudo-changes caused by external factors. Recent advancements in deep learning and image change detection have shown remarkable performance with ConvNet-based and Transformer-based techniques. However, ConvNet-based methods are limited by the local receptive fields of convolutional kernels that cannot effectively capture the change features in spatial–temporal information, while Transformer-based CD models need to be driven by a large amount of data due to the lack of inductive biases, and at the same time need to bear the costly computational complexity brought by self-attention. To address these challenges, we propose a Transformer-based Siamese network structure called BTNIFormer. It incorporates a sparse attention mechanism called Dilated Neighborhood Attention (DiNA), which localizes the attention range of each pixel to its neighboring context. Extensive experiments conducted on two publicly available datasets demonstrate the benefits of our proposed innovation. Compared to the most competitive recent Transformer-based approaches, our method achieves a significant 12.00% improvement in IoU while reducing computational costs by half. This provides a promising solution for further development of the Transformer structure in CD tasks. Full article
Show Figures

Figure 1

16 pages, 1349 KiB  
Article
Payment Behavioral Response Mechanisms for All-Age Retrofitting of Older Communities: A Study among Chinese Residents
by Yang Zhang and Lei Dong
Behav. Sci. 2023, 13(11), 925; https://doi.org/10.3390/bs13110925 - 13 Nov 2023
Viewed by 1685
Abstract
Intergenerational integration has given rise to a novel aging paradigm known as all-age communities, which is garnering international attention. In China, the aging population and the implementation of the three-child policy have resulted in increased demand for retirement and childcare services among residents [...] Read more.
Intergenerational integration has given rise to a novel aging paradigm known as all-age communities, which is garnering international attention. In China, the aging population and the implementation of the three-child policy have resulted in increased demand for retirement and childcare services among residents in older neighborhoods. Consequently, there is a pressing need to retrofit these older neighborhoods to accommodate all-age living arrangements given the high demand they generate. Therefore, this study undertakes research interviews with residents and constructs an exploratory theoretical model rooted in established theory. To assess the significance of our model, we employ Smart PLS 3.0 based on 297 empirical data points. Our findings indicate that anxiety has a significant negative effect on payment behavior; objective perception, willingness to pay, and government assistance exert significant positive effects on payment behavior. By comprehensively analyzing the mechanisms underlying residents’ payment behavior, this study provides valuable insights for the government for promoting the aging process within communities and formulating effective transformation policies. Full article
Show Figures

Figure 1

Back to TopTop