Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (73)

Search Parameters:
Keywords = Mamba

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 2031 KiB  
Article
ConvMambaSR: Leveraging State-Space Models and CNNs in a Dual-Branch Architecture for Remote Sensing Imagery Super-Resolution
by Qiwei Zhu, Guojing Zhang, Xuechao Zou, Xiaoying Wang, Jianqiang Huang and Xilai Li
Remote Sens. 2024, 16(17), 3254; https://doi.org/10.3390/rs16173254 - 2 Sep 2024
Viewed by 577
Abstract
Deep learning-based super-resolution (SR) techniques play a crucial role in enhancing the spatial resolution of images. However, remote sensing images present substantial challenges due to their diverse features, complex structures, and significant size variations in ground objects. Moreover, recovering lost details from low-resolution [...] Read more.
Deep learning-based super-resolution (SR) techniques play a crucial role in enhancing the spatial resolution of images. However, remote sensing images present substantial challenges due to their diverse features, complex structures, and significant size variations in ground objects. Moreover, recovering lost details from low-resolution remote sensing images with complex and unknown degradations, such as downsampling, noise, and compression, remains a critical issue. To address these challenges, we propose ConvMambaSR, a novel super-resolution framework that integrates state-space models (SSMs) and Convolutional Neural Networks (CNNs). This framework is specifically designed to handle heterogeneous and complex ground features, as well as unknown degradations in remote sensing imagery. ConvMambaSR leverages SSMs to model global dependencies, activating more pixels in the super-resolution task. Concurrently, it employs CNNs to extract local detail features, enhancing the model’s ability to capture image textures and edges. Furthermore, we have developed a global–detail reconstruction module (GDRM) to integrate diverse levels of global and local information efficiently. We rigorously validated the proposed method on two distinct datasets, RSSCN7 and RSSRD-KQ, and benchmarked its performance against state-of-the-art SR models. Experiments show that our method achieves SOTA PSNR values of 26.06 and 24.29 on these datasets, respectively, and is visually superior, effectively addressing a variety of scenarios and significantly outperforming existing methods. Full article
(This article belongs to the Special Issue Image Enhancement and Fusion Techniques in Remote Sensing)
Show Figures

Figure 1

18 pages, 5652 KiB  
Article
LDMNet: Enhancing the Segmentation Capabilities of Unmanned Surface Vehicles in Complex Waterway Scenarios
by Tongyang Dai, Huiyu Xiang, Chongjie Leng, Song Huang, Guanghui He and Shishuo Han
Appl. Sci. 2024, 14(17), 7706; https://doi.org/10.3390/app14177706 - 31 Aug 2024
Viewed by 499
Abstract
Semantic segmentation-based Complex Waterway Scene Understanding has shown great promise in the environmental perception of Unmanned Surface Vehicles. Existing methods struggle with estimating the edges of obstacles under conditions of blurred water surfaces. To address this, we propose the Lightweight Dual-branch Mamba Network [...] Read more.
Semantic segmentation-based Complex Waterway Scene Understanding has shown great promise in the environmental perception of Unmanned Surface Vehicles. Existing methods struggle with estimating the edges of obstacles under conditions of blurred water surfaces. To address this, we propose the Lightweight Dual-branch Mamba Network (LDMNet), which includes a CNN-based Deep Dual-branch Network for extracting image features and a Mamba-based fusion module for aggregating and integrating global information. Specifically, we improve the Deep Dual-branch Network structure by incorporating multiple Atrous branches for local fusion; we design a Convolution-based Recombine Attention Module, which serves as the gate activation condition for Mamba-2 to enhance feature interaction and global information fusion from both spatial and channel dimensions. Moreover, to tackle the directional sensitivity of image serialization and the impact of the State Space Model’s forgetting strategy on non-causal data modeling, we introduce a Hilbert curve scanning mechanism to achieve multi-scale feature serialization. By stacking feature sequences, we alleviate the local bias of Mamba-2 towards image sequence data. LDMNet integrates the Deep Dual-branch Network, Recombine Attention, and Mamba-2 blocks, effectively capturing the long-range dependencies and multi-scale global context information of Complex Waterway Scene images. The experimental results on four benchmarks show that the proposed LDMNet significantly improves obstacle edge segmentation performance and outperforms existing methods across various performance metrics. Full article
(This article belongs to the Section Marine Science and Engineering)
Show Figures

Figure 1

20 pages, 2467 KiB  
Article
RegMamba: An Improved Mamba for Medical Image Registration
by Xin Hu, Jiaqi Chen and Yilin Chen
Electronics 2024, 13(16), 3305; https://doi.org/10.3390/electronics13163305 - 20 Aug 2024
Viewed by 635
Abstract
Deformable medical image registration aims to minimize the differences between fixed and moving images to provide comprehensive physiological or structural information for further medical analysis. Traditional learning-based convolutional network approaches usually suffer from the problem of perceptual limitations, and in recent years, the [...] Read more.
Deformable medical image registration aims to minimize the differences between fixed and moving images to provide comprehensive physiological or structural information for further medical analysis. Traditional learning-based convolutional network approaches usually suffer from the problem of perceptual limitations, and in recent years, the Transformer architecture has gained popularity for its superior long-range relational modeling capabilities, but still faces severe computational challenges in handling high-resolution medical images. Recently, selective state-space models have shown great potential in the vision domain due to their fast inference and efficient modeling. Inspired by this, in this paper, we propose RegMamba, a novel medical image registration architecture that combines convolutional and state-space models (SSMs), designed to efficiently capture complex correspondence in registration while maintaining efficient computational effort. Firstly our model introduces Mamba to efficiently remotely model and process potential dependencies of the data to capture large deformations. At the same time, we use a scaled convolutional layer in Mamba to alleviate the problem of spatial information loss in 3D data flattening processing in Mamba. Then, a deformable convolutional residual module (DCRM) is proposed to adaptively adjust the sampling position and process deformations to capture more flexible spatial features while learning fine-grained features of different anatomical structures to construct local correspondences and improve model perception. We demonstrate the advanced registration performance of our method on the LPBA40 and IXI public datasets. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images, 2nd Edition)
Show Figures

Figure 1

14 pages, 7579 KiB  
Article
Optimization and Application of Improved YOLOv9s-UI for Underwater Object Detection
by Wei Pan, Jiabao Chen, Bangjun Lv and Likun Peng
Appl. Sci. 2024, 14(16), 7162; https://doi.org/10.3390/app14167162 - 15 Aug 2024
Viewed by 407
Abstract
The You Only Look Once (YOLO) series of object detection models is widely recognized for its efficiency and real-time performance, particularly under the challenging conditions of underwater environments, characterized by insufficient lighting and visual disturbances. By modifying the YOLOv9s model, this study aims [...] Read more.
The You Only Look Once (YOLO) series of object detection models is widely recognized for its efficiency and real-time performance, particularly under the challenging conditions of underwater environments, characterized by insufficient lighting and visual disturbances. By modifying the YOLOv9s model, this study aims to improve the accuracy and real-time capabilities of underwater object detection, resulting in the introduction of the YOLOv9s-UI detection model. The proposed model incorporates the Dual Dynamic Token Mixer (D-Mixer) module from TransXNet to improve feature extraction capabilities. Additionally, it integrates a feature fusion network design from the LocalMamba network, employing channel and spatial attention mechanisms. These attention modules effectively guide the feature fusion process, significantly enhancing detection accuracy while maintaining the model’s compact size of only 9.3 M. Experimental evaluation on the UCPR2019 underwater object dataset shows that the YOLOv9s-UI model has higher accuracy and recall than the existing YOLOv9s model, as well as excellent real-time performance. This model significantly improves the ability of underwater target detection by introducing advanced feature extraction and attention mechanisms. The model meets portability requirements and provides a more efficient solution for underwater detection. Full article
(This article belongs to the Section Marine Science and Engineering)
Show Figures

Figure 1

20 pages, 1572 KiB  
Article
MLGTM: Multi-Scale Local Geometric Transformer-Mamba Application in Terracotta Warriors Point Cloud Classification
by Pengbo Zhou, Li An, Yong Wang and Guohua Geng
Remote Sens. 2024, 16(16), 2920; https://doi.org/10.3390/rs16162920 - 9 Aug 2024
Viewed by 626
Abstract
As an important representative of ancient Chinese cultural heritage, the classification of Terracotta Warriors point cloud data aids in cultural heritage preservation and digital reconstruction. However, these data face challenges such as complex morphological and structural variations, sparsity, and irregularity. This paper proposes [...] Read more.
As an important representative of ancient Chinese cultural heritage, the classification of Terracotta Warriors point cloud data aids in cultural heritage preservation and digital reconstruction. However, these data face challenges such as complex morphological and structural variations, sparsity, and irregularity. This paper proposes a method named Multi-scale Local Geometric Transformer-Mamba (MLGTM) to improve the accuracy and robustness of Terracotta Warriors point cloud classification tasks. To effectively capture the geometric information of point clouds, we introduce local geometric encoding, including local coordinates and feature information, effectively capturing the complex local morphology and structural variations of the Terracotta Warriors and extracting representative local features. Additionally, we propose a multi-scale Transformer-Mamba information aggregation module, which employs a dual-branch Transformer with a Mamba structure and finally aggregates them on multiple scales to effectively handle the sparsity and irregularity of the Terracotta Warriors point cloud data. We conducted experiments on several datasets, including the ModelNet40, ScanObjectNN, ShapeNetPart, ETH, and 3D Terracotta Warriors fragment datasets. The results show that our method significantly improves the classification task of Terracotta Warriors point clouds, demonstrating strong accuracy. Full article
(This article belongs to the Special Issue New Perspectives on 3D Point Cloud II)
Show Figures

Graphical abstract

20 pages, 19406 KiB  
Article
A Novel Real-Time Detection and Classification Method for ECG Signal Images Based on Deep Learning
by Linjuan Ma and Fuquan Zhang
Sensors 2024, 24(16), 5087; https://doi.org/10.3390/s24165087 - 6 Aug 2024
Viewed by 527
Abstract
In this paper, a novel deep learning method Mamba-RAYOLO is presented, which can improve detection and classification in the processing and analysis of ECG images in real time by integrating three advanced modules. The feature extraction module in our work with a multi-branch [...] Read more.
In this paper, a novel deep learning method Mamba-RAYOLO is presented, which can improve detection and classification in the processing and analysis of ECG images in real time by integrating three advanced modules. The feature extraction module in our work with a multi-branch structure during training can capture a wide range of features to ensure efficient inference and rich feature extraction. The attention mechanism module utilized in our proposed network can dynamically focus on the most relevant spatial and channel-wise features to improve detection accuracy and computational efficiency. Then, the extracted features can be refined for efficient spatial feature processing and robust feature fusion. Several sets of experiments have been carried out to test the validity of the proposed Mamba-RAYOLO and these indicate that our method has made significant improvements in the detection and classification of ECG images. The research offers a promising framework for more accurate and efficient medical ECG diagnostics. Full article
(This article belongs to the Special Issue Sensors Technology and Application in ECG Signal Processing)
Show Figures

Figure 1

19 pages, 7542 KiB  
Article
The Mamba Model: A Novel Approach for Predicting Ship Trajectories
by Yongfeng Suo, Zhengnian Ding and Tao Zhang
J. Mar. Sci. Eng. 2024, 12(8), 1321; https://doi.org/10.3390/jmse12081321 - 5 Aug 2024
Viewed by 579
Abstract
To address the complexity of ship trajectory prediction, this study explored the efficacy of the Mamba model, a relatively new deep-learning framework. In order to evaluate the performance of the Mamba model relative to traditional models, which often struggle to cope with the [...] Read more.
To address the complexity of ship trajectory prediction, this study explored the efficacy of the Mamba model, a relatively new deep-learning framework. In order to evaluate the performance of the Mamba model relative to traditional models, which often struggle to cope with the dynamic and nonlinear nature of maritime navigation data, we analyzed a dataset consisting of intricate ship trajectory data. The prediction accuracy and inference speed of the model were evaluated using metrics such as the mean absolute error (MAE) and root mean square error (RMSE). The Mamba model not only excelled in terms of the computational efficiency, with inference times of 0.1759 s per batch—approximately 7.84 times faster than the widely used Transformer model—it also processed 3.9052 samples per second, which is higher than the Transformer model’s 0.7246 samples per second. Additionally, it demonstrated high prediction accuracy and the lowest loss among the evaluated models. The Mamba model provides a new tool for ship trajectory prediction, which represents an advancement in addressing the challenges of maritime trajectory analysis when compared to existing deep-learning methods. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

17 pages, 4954 KiB  
Article
Medical Image Classification with a Hybrid SSM Model Based on CNN and Transformer
by Can Hu, Ning Cao, Han Zhou and Bin Guo
Electronics 2024, 13(15), 3094; https://doi.org/10.3390/electronics13153094 - 5 Aug 2024
Viewed by 783
Abstract
Medical image classification, a pivotal task for diagnostic accuracy, poses unique challenges due to the intricate and variable nature of medical images compared to their natural counterparts. While Convolutional Neural Networks (CNNs) and Transformers are prevalent in this domain, each architecture has its [...] Read more.
Medical image classification, a pivotal task for diagnostic accuracy, poses unique challenges due to the intricate and variable nature of medical images compared to their natural counterparts. While Convolutional Neural Networks (CNNs) and Transformers are prevalent in this domain, each architecture has its drawbacks. CNNs, despite their strength in local feature extraction, fall short in capturing global context, whereas Transformers excel at global information but can overlook fine-grained details. The integration of CNNs and Transformers in a hybrid model aims to bridge this gap by enabling simultaneous local and global feature extraction. However, this approach remains constrained in its capacity to model long-range dependencies, thereby hindering the efficient extraction of distant features. To address these issues, we introduce the MambaConvT model, which employs a state-space approach. It begins by locally processing input features through multi-core convolution, enhancing the extraction of deep, discriminative local details. Next, depth-separable convolution with a 2D selective scanning module (SS2D) is employed to maintain a global receptive field and establish long-distance connections, capturing the fine-grained features. The model then combines hybrid features for comprehensive feature extraction, followed by global feature modeling to emphasize on global detail information and optimize feature representation. This paper conducts thorough performance experiments on different algorithms across four publicly available datasets and two private datasets. The results demonstrate that MambaConvT outperforms the latest classification algorithms in terms of accuracy, precision, recall, F1 score, and AUC value ratings, achieving superior performance in the precise classification of medical images. Full article
(This article belongs to the Special Issue Biomedical Image Processing and Classification, 2nd Edition)
Show Figures

Figure 1

23 pages, 13090 KiB  
Article
Accurate UAV Small Object Detection Based on HRFPN and EfficentVMamba
by Shixiao Wu, Xingyuan Lu, Chengcheng Guo and Hong Guo
Sensors 2024, 24(15), 4966; https://doi.org/10.3390/s24154966 - 31 Jul 2024
Viewed by 614
Abstract
(1) Background: Small objects in Unmanned Aerial Vehicle (UAV) images are often scattered throughout various regions of the image, such as the corners, and may be blocked by larger objects, as well as susceptible to image noise. Moreover, due to their small size, [...] Read more.
(1) Background: Small objects in Unmanned Aerial Vehicle (UAV) images are often scattered throughout various regions of the image, such as the corners, and may be blocked by larger objects, as well as susceptible to image noise. Moreover, due to their small size, these objects occupy a limited area in the image, resulting in a scarcity of effective features for detection. (2) Methods: To address the detection of small objects in UAV imagery, we introduce a novel algorithm called High-Resolution Feature Pyramid Network Mamba-Based YOLO (HRMamba-YOLO). This algorithm leverages the strengths of a High-Resolution Network (HRNet), EfficientVMamba, and YOLOv8, integrating a Double Spatial Pyramid Pooling (Double SPP) module, an Efficient Mamba Module (EMM), and a Fusion Mamba Module (FMM) to enhance feature extraction and capture contextual information. Additionally, a new Multi-Scale Feature Fusion Network, High-Resolution Feature Pyramid Network (HRFPN), and FMM improved feature interactions and enhanced the performance of small object detection. (3) Results: For the VisDroneDET dataset, the proposed algorithm achieved a 4.4% higher Mean Average Precision (mAP) compared to YOLOv8-m. The experimental results showed that HRMamba achieved a mAP of 37.1%, surpassing YOLOv8-m by 3.8% (Dota1.5 dataset). For the UCAS_AOD dataset and the DIOR dataset, our model had a mAP 1.5% and 0.3% higher than the YOLOv8-m model, respectively. To be fair, all the models were trained without a pre-trained model. (4) Conclusions: This study not only highlights the exceptional performance and efficiency of HRMamba-YOLO in small object detection tasks but also provides innovative solutions and valuable insights for future research. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

15 pages, 667 KiB  
Article
Multi-Dimensional Fusion Attention Mechanism with Vim-like Structure for Mobile Network Design
by Jialiang Shi, Rigui Zhou, Pengju Ren and Zhengyu Long
Appl. Sci. 2024, 14(15), 6670; https://doi.org/10.3390/app14156670 - 31 Jul 2024
Viewed by 478
Abstract
Recent advancements in mobile neural networks, such as the squeeze-and-excitation (SE) attention mechanism, have significantly improved model performance. However, they often overlook the crucial interaction between location information and channels. The interaction of multiple dimensions in feature engineering is of paramount importance for [...] Read more.
Recent advancements in mobile neural networks, such as the squeeze-and-excitation (SE) attention mechanism, have significantly improved model performance. However, they often overlook the crucial interaction between location information and channels. The interaction of multiple dimensions in feature engineering is of paramount importance for achieving high-quality results. The Transformer model and its successors, such as Mamba and Vision Mamba, have effectively combined features and linked location information. This approach has transitioned from NLP (natural language processing) to CV (computer vision). This paper introduces a novel attention mechanism for mobile neural networks inspired by the structure of Vim (Vision Mamba). It adopts a “1 + 3” architecture to embed multi-dimensional information into channel attention, termed ”Multi-Dimensional Vim-like Attention Mechanism”. The proposed method splits the input into two major branches: the left branch retains the original information for subsequent feature screening, while the right branch divides the channel attention into three one-dimensional feature encoding processes. These processes aggregate features along one channel direction and two spatial directions, simultaneously capturing remote dependencies and preserving precise location information. The resulting feature maps are then combined with the left branch to produce direction-aware, location-sensitive, and channel-aware attention maps. The multi-dimensional Vim-like attention module is simple and can be seamlessly integrated into classical mobile neural networks such as MobileNetV2 and ShuffleNetV2 with minimal computational overhead. Experimental results demonstrate that this attention module adapts well to mobile neural networks with a low parameter count, delivering excellent performance on the CIFAR-100 and MS COCO datasets. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 6529 KiB  
Article
MambaSR: Arbitrary-Scale Super-Resolution Integrating Mamba with Fast Fourier Convolution Blocks
by Jin Yan, Zongren Chen, Zhiyuan Pei, Xiaoping Lu and Hua Zheng
Mathematics 2024, 12(15), 2370; https://doi.org/10.3390/math12152370 - 30 Jul 2024
Viewed by 682
Abstract
Traditional single image super-resolution (SISR) methods, which focus on integer scale super-resolution, often require separate training for each scale factor, leading to increased computational resource consumption. In this paper, we propose MambaSR, a novel arbitrary-scale super-resolution approach integrating Mamba with Fast Fourier Convolution [...] Read more.
Traditional single image super-resolution (SISR) methods, which focus on integer scale super-resolution, often require separate training for each scale factor, leading to increased computational resource consumption. In this paper, we propose MambaSR, a novel arbitrary-scale super-resolution approach integrating Mamba with Fast Fourier Convolution Blocks. MambaSR leverages the strengths of the Mamba state-space model to extract long-range dependencies. In addition, Fast Fourier Convolution Blocks are proposed to capture the global information in the frequency domain. The experimental results demonstrate that MambaSR achieves superior performance compared to different methods across various benchmark datasets. Specifically, on the Urban100 dataset, MambaSR outperforms MetaSR by 0.93 dB in PSNR and 0.0203 dB in SSIM, and on the Manga109 dataset, it achieves an average PSNR improvement of 1.00 dB and an SSIM improvement of 0.0093 dB. These results highlight the efficacy of MambaSR in enhancing image quality for arbitrary-scale super-resolution. Full article
Show Figures

Figure 1

20 pages, 27344 KiB  
Article
DeMambaNet: Deformable Convolution and Mamba Integration Network for High-Precision Segmentation of Ambiguously Defined Dental Radicular Boundaries
by Binfeng Zou, Xingru Huang, Yitao Jiang, Kai Jin and Yaoqi Sun
Sensors 2024, 24(14), 4748; https://doi.org/10.3390/s24144748 - 22 Jul 2024
Viewed by 702
Abstract
The incorporation of automatic segmentation methodologies into dental X-ray images refined the paradigms of clinical diagnostics and therapeutic planning by facilitating meticulous, pixel-level articulation of both dental structures and proximate tissues. This underpins the pillars of early pathological detection and meticulous disease progression [...] Read more.
The incorporation of automatic segmentation methodologies into dental X-ray images refined the paradigms of clinical diagnostics and therapeutic planning by facilitating meticulous, pixel-level articulation of both dental structures and proximate tissues. This underpins the pillars of early pathological detection and meticulous disease progression monitoring. Nonetheless, conventional segmentation frameworks often encounter significant setbacks attributable to the intrinsic limitations of X-ray imaging, including compromised image fidelity, obscured delineation of structural boundaries, and the intricate anatomical structures of dental constituents such as pulp, enamel, and dentin. To surmount these impediments, we propose the Deformable Convolution and Mamba Integration Network, an innovative 2D dental X-ray image segmentation architecture, which amalgamates a Coalescent Structural Deformable Encoder, a Cognitively-Optimized Semantic Enhance Module, and a Hierarchical Convergence Decoder. Collectively, these components bolster the management of multi-scale global features, fortify the stability of feature representation, and refine the amalgamation of feature vectors. A comparative assessment against 14 baselines underscores its efficacy, registering a 0.95% enhancement in the Dice Coefficient and a diminution of the 95th percentile Hausdorff Distance to 7.494. Full article
(This article belongs to the Special Issue Biomedical Imaging, Sensing and Signal Processing)
Show Figures

Figure 1

23 pages, 7788 KiB  
Article
A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation
by Hao Ding, Bo Xia, Weilin Liu, Zekai Zhang, Jinglin Zhang, Xing Wang and Sen Xu
Remote Sens. 2024, 16(14), 2620; https://doi.org/10.3390/rs16142620 - 17 Jul 2024
Cited by 1 | Viewed by 826
Abstract
Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential [...] Read more.
Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential to develop real-time semantic segmentation methods that can be applied to resource-limited platforms, such as edge devices. The majority of mainstream real-time semantic segmentation methods rely on convolutional neural networks (CNNs) and transformers. However, CNNs cannot effectively capture long-range dependencies, while transformers have high computational complexity. This paper proposes a novel remote sensing Mamba architecture for real-time segmentation tasks in remote sensing, named RTMamba. Specifically, the backbone utilizes a Visual State-Space (VSS) block to extract deep features and maintains linear computational complexity, thereby capturing long-range contextual information. Additionally, a novel Inverted Triangle Pyramid Pooling (ITP) module is incorporated into the decoder. The ITP module can effectively filter redundant feature information and enhance the perception of objects and their boundaries in remote sensing images. Extensive experiments were conducted on three challenging aerial remote sensing segmentation benchmarks, including Vaihingen, Potsdam, and LoveDA. The results show that RTMamba achieves competitive performance advantages in terms of segmentation accuracy and inference speed compared to state-of-the-art CNN and transformer methods. To further validate the deployment potential of the model on embedded devices with limited resources, such as UAVs, we conducted tests on the Jetson AGX Orin edge device. The experimental results demonstrate that RTMamba achieves impressive real-time segmentation performance. Full article
Show Figures

Figure 1

14 pages, 968 KiB  
Article
MambaReID: Exploiting Vision Mamba for Multi-Modal Object Re-Identification
by Ruijuan Zhang, Lizhong Xu, Song Yang and Li Wang
Sensors 2024, 24(14), 4639; https://doi.org/10.3390/s24144639 - 17 Jul 2024
Viewed by 684
Abstract
Multi-modal object re-identification (ReID) is a challenging task that seeks to identify objects across different image modalities by leveraging their complementary information. Traditional CNN-based methods are constrained by limited receptive fields, whereas Transformer-based approaches are hindered by high computational demands and a lack [...] Read more.
Multi-modal object re-identification (ReID) is a challenging task that seeks to identify objects across different image modalities by leveraging their complementary information. Traditional CNN-based methods are constrained by limited receptive fields, whereas Transformer-based approaches are hindered by high computational demands and a lack of convolutional biases. To overcome these limitations, we propose a novel fusion framework named MambaReID, integrating the strengths of both architectures with the effective VMamba. Specifically, our MambaReID consists of three components: Three-Stage VMamba (TSV), Dense Mamba (DM), and Consistent VMamba Fusion (CVF). TSV efficiently captures global context information and local details with low computational complexity. DM enhances feature discriminability by fully integrating inter-modality information with shallow and deep features through dense connections. Additionally, with well-aligned multi-modal images, CVF provides more granular modal aggregation, thereby improving feature robustness. The MambaReID framework, with its innovative components, not only achieves superior performance in multi-modal object ReID tasks, but also does so with fewer parameters and lower computational costs. Our proposed MambaReID’s effectiveness is validated by extensive experiments conducted on three multi-modal object ReID benchmarks. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

16 pages, 3732 KiB  
Article
Visual State Space Model for Image Deraining with Symmetrical Scanning
by Yaoqing Zhang, Xin He, Chunxia Zhan and Junjie Li
Symmetry 2024, 16(7), 871; https://doi.org/10.3390/sym16070871 - 9 Jul 2024
Viewed by 490
Abstract
Image deraining aims to mitigate the adverse effects of rain streaks on image quality. Recently, the advent of convolutional neural networks (CNNs) and Vision Transformers (ViTs) has catalyzed substantial advancements in this field. However, these methods fail to effectively balance model efficiency and [...] Read more.
Image deraining aims to mitigate the adverse effects of rain streaks on image quality. Recently, the advent of convolutional neural networks (CNNs) and Vision Transformers (ViTs) has catalyzed substantial advancements in this field. However, these methods fail to effectively balance model efficiency and image deraining performance. In this paper, we propose an effective, locally enhanced visual state space model for image deraining, called DerainMamba. Specifically, we introduce a global-aware state space model to better capture long-range dependencies with linear complexity. In contrast to existing methods that utilize fixed unidirectional scan mechanisms, we propose a direction-aware symmetrical scanning module to enhance the feature capture of rain streak direction. Furthermore, we integrate a local-aware mixture of experts into our framework to mitigate local pixel forgetting, thereby enhancing the overall quality of high-resolution image reconstruction. Experimental results validate that the proposed method surpasses state-of-the-art approaches on six benchmark datasets. Full article
Show Figures

Figure 1

Back to TopTop