Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (17,580)

Search Parameters:
Keywords = vision

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 6336 KiB  
Article
Effective Strategies for Enhancing Real-Time Weapons Detection in Industry
by Ángel Torregrosa-Domínguez, Juan A. Álvarez-García, Jose L. Salazar-González and Luis M. Soria-Morillo
Appl. Sci. 2024, 14(18), 8198; https://doi.org/10.3390/app14188198 (registering DOI) - 12 Sep 2024
Abstract
Gun violence is a global problem that affects communities and individuals, posing challenges to safety and well-being. The use of autonomous weapons detection systems could significantly improve security worldwide. Despite notable progress in the field of weapons detection closed-circuit television-based systems, several challenges [...] Read more.
Gun violence is a global problem that affects communities and individuals, posing challenges to safety and well-being. The use of autonomous weapons detection systems could significantly improve security worldwide. Despite notable progress in the field of weapons detection closed-circuit television-based systems, several challenges persist, including real-time detection, improved accuracy in detecting small objects, and reducing false positives. This paper, based on our extensive experience in this field and successful private company contracts, presents a detection scheme comprising two modules that enhance the performance of a renowned detector. These modules not only augment the detector’s performance but also have a low negative impact on the inference time. Additionally, a scale-matching technique is utilised to enhance the detection of weapons with a small aspect ratio. The experimental results demonstrate that the scale-matching method enhances the detection of small objects, with an improvement of +13.23 in average precision compared to the non-use of this method. Furthermore, the proposed detection scheme effectively reduces the number of false positives (a 71% reduction in the total number of false positives) of the baseline model, while maintaining a low inference time (34 frames per second on an NVIDIA GeForce RTX-3060 card with a resolution of 720 pixels) in comparison to the baseline model (47 frames per second). Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Industrial Engineering)
Show Figures

Figure 1

17 pages, 8334 KiB  
Article
PAIBoard: A Neuromorphic Computing Platform for Hybrid Neural Networks in Robot Dog Application
by Guang Chen, Jian Cao, Chenglong Zou, Shuo Feng, Yi Zhong, Xing Zhang and Yuan Wang
Electronics 2024, 13(18), 3619; https://doi.org/10.3390/electronics13183619 (registering DOI) - 12 Sep 2024
Abstract
Hybrid neural networks (HNNs), integrating the strengths of artificial neural networks (ANNs) and spiking neural networks (SNNs), provide a promising solution towards generic artificial intelligence. There is a prevailing trend towards designing unified SNN-ANN paradigm neuromorphic computing chips to support HNNs, but developing [...] Read more.
Hybrid neural networks (HNNs), integrating the strengths of artificial neural networks (ANNs) and spiking neural networks (SNNs), provide a promising solution towards generic artificial intelligence. There is a prevailing trend towards designing unified SNN-ANN paradigm neuromorphic computing chips to support HNNs, but developing platforms to advance neuromorphic computing systems is equally essential. This paper presents the PAIBoard platform, which is designed to facilitate the implementation of HNNs. The platform comprises three main components: the upper computer, the communication module, and the neuromorphic computing chip. Both hardware and software performance measurements indicate that our platform achieves low power consumption, high energy efficiency and comparable task accuracy. Furthermore, PAIBoard is applied in a robot dog for tracking and obstacle avoidance system. The tracking module combines data from ultra-wide band (UWB) transceivers and vision, while the obstacle avoidance module utilizes depth information from an RGB-D camera, which further underscores the potential of our platform to tackle challenging tasks in real-world applications. Full article
Show Figures

Figure 1

18 pages, 9000 KiB  
Article
Multilevel Geometric Feature Embedding in Transformer Network for ALS Point Cloud Semantic Segmentation
by Zhuanxin Liang and Xudong Lai
Remote Sens. 2024, 16(18), 3386; https://doi.org/10.3390/rs16183386 (registering DOI) - 12 Sep 2024
Viewed by 108
Abstract
Effective semantic segmentation of Airborne Laser Scanning (ALS) point clouds is a crucial field of study and influences subsequent point cloud application tasks. Transformer networks have made significant progress in 2D/3D computer vision tasks, exhibiting superior performance. We propose a multilevel geometric feature [...] Read more.
Effective semantic segmentation of Airborne Laser Scanning (ALS) point clouds is a crucial field of study and influences subsequent point cloud application tasks. Transformer networks have made significant progress in 2D/3D computer vision tasks, exhibiting superior performance. We propose a multilevel geometric feature embedding transformer network (MGFE-T), which aims to fully utilize the three-dimensional structural information carried by point clouds and enhance transformer performance in ALS point cloud semantic segmentation. In the encoding stage, compute the geometric features surrounding tee sampling points at each layer and embed them into the transformer workflow. To ensure that the receptive field of the self-attention mechanism and the geometric computation domain can maintain a consistent scale at each layer, we propose a fixed-radius dilated KNN (FR-DKNN) search method to address the limitation of traditional KNN search methods in considering domain radius. In the decoding stage, we aggregate prediction deviations at each level into a unified loss value, enabling multilevel supervision to improve the network’s feature learning ability at different levels. The MGFE-T network can predict the class label of each point in an end-to-end manner. Experiments were conducted on three widely used benchmark datasets. The results indicate that the MGFE-T network achieves superior OA and mF1 scores on the LASDU and DFC2019 datasets and performs well on the ISPRS dataset with imbalanced classes. Full article
Show Figures

Figure 1

13 pages, 3654 KiB  
Article
Effective Structural Parametric Form in Architecture Using Mycelium Bio-Composites
by Efstathios T. Gavriilidis, Maristella E. Voutetaki and Dimitrios G. Giouzepas
Architecture 2024, 4(3), 717-729; https://doi.org/10.3390/architecture4030037 (registering DOI) - 12 Sep 2024
Viewed by 137
Abstract
This study investigates a parametric architectural design methodology that arises from the relationship between humans, architecture, and nature and utilizes modern technological means and sustainable construction materials. Specifically, it concerns a structure of mycelium bio-composite, produced at the lowest possible environmental cost. The [...] Read more.
This study investigates a parametric architectural design methodology that arises from the relationship between humans, architecture, and nature and utilizes modern technological means and sustainable construction materials. Specifically, it concerns a structure of mycelium bio-composite, produced at the lowest possible environmental cost. The design uses an optimal structural form to maximize the material’s efficiency. The development of the structure is initially modular, using two different types of geometric blocks. At the same time, the whole structure gradually becomes monolithic with the help of the plant part of the fungi, the mycelium. The basic 2D arch structure is initially assembled using two different geometric blocks. More complex configurations can be derived from this foundational module to meet various requirements for applications and structures. The structure will be constructed entirely of load-bearing mycelium blocks, with its geometry specifically designed to emphasize compression forms, enhancing the structural performance of the inherently weak material. This approach reflects an innovative vision for construction materials grounded in the principles of cultivation and growth from natural, earth-derived resources. Full article
Show Figures

Graphical abstract

14 pages, 4199 KiB  
Article
Detection Method for Inter-Turn Short Circuit Faults in Dry-Type Transformers Based on an Improved YOLOv8 Infrared Image Slicing-Aided Hyper-Inference Algorithm
by Zhaochuang Zhang, Jianhua Xia, Yuchuan Wen, Liting Weng, Zuofu Ma, Hekai Yang, Haobo Yang, Jinyao Dou, Jingang Wang and Pengcheng Zhao
Energies 2024, 17(18), 4559; https://doi.org/10.3390/en17184559 (registering DOI) - 12 Sep 2024
Viewed by 204
Abstract
Inter-Turn Short Circuit (ITSC) faults do not necessarily produce high temperatures but exhibit distinct heat distribution and characteristics. This paper proposes a novel fault diagnosis and identification scheme utilizing an improved You Look Only Once Vision 8 (YOLOv8) algorithm, enhanced with an infrared [...] Read more.
Inter-Turn Short Circuit (ITSC) faults do not necessarily produce high temperatures but exhibit distinct heat distribution and characteristics. This paper proposes a novel fault diagnosis and identification scheme utilizing an improved You Look Only Once Vision 8 (YOLOv8) algorithm, enhanced with an infrared image slicing-aided hyper-inference (SAHI) technique, to automatically detect ITSC fault trajectories in dry-type transformers. The infrared image acquisition system gathers data on ITSC fault trajectories and captures images with varying contrast to enhance the robustness of the recognition model. Given that the fault trajectory constitutes a small portion of the overall infrared image and is subject to significant background interference, traditional recognition algorithms often misjudge or omit faults. To address this, a YOLOv8-based visual detection method incorporating Dynamic Snake Convolution (DSConv) and the Slicing-Aided Hyper-Inference algorithm is proposed. This method aims to improve recognition precision and accuracy for small targets in complex backgrounds, facilitating accurate detection of ITSC faults in dry-type transformers. Comparative tests with the YOLOv8 model, Fast Region-based Convolutional Neural Networks (Fast-RCNNs), and Residual Neural Networks (Retina-Nets) demonstrate that the enhancements significantly improve model convergence speed and fault trajectory detection accuracy. The approach offers valuable insights for advancing infrared image diagnostic technology in electrical power equipment. Full article
(This article belongs to the Section F: Electrical Engineering)
Show Figures

Figure 1

24 pages, 3886 KiB  
Article
De-Carbonisation Pathways in Jiangxi Province, China: A Visualisation Based on Panel Data
by Shun Li, Jie Hua and Gaofeng Luo
Atmosphere 2024, 15(9), 1108; https://doi.org/10.3390/atmos15091108 (registering DOI) - 11 Sep 2024
Viewed by 333
Abstract
Environmental degradation remains a pressing global concern, prompting many nations to adopt measures to mitigate carbon emissions. In response to international pressure, China has committed to achieving a carbon peak by 2030 and carbon neutrality by 2060. Despite extensive research on China’s overall [...] Read more.
Environmental degradation remains a pressing global concern, prompting many nations to adopt measures to mitigate carbon emissions. In response to international pressure, China has committed to achieving a carbon peak by 2030 and carbon neutrality by 2060. Despite extensive research on China’s overall carbon emissions, there has been limited focus on individual provinces, particularly less developed provinces. Jiangxi Province, an underdeveloped province in southeastern China, recorded the highest GDP (Gross Domestic Product) growth rate in the country in 2022, and it holds significant potential for carbon emission reduction. This study uses data from Jiangxi Province’s 14th Five-Year Plan and Vision 2035 to create three carbon emission reduction scenarios and predict emissions. The extended STIRPAT (Stochastic Impacts by Regression on Population, Affluence, and Technology), along with various visualisation techniques, is employed to analyse the impacts of population size, primary electricity application level, GDP per capita, the share of the secondary industry in fixed-asset investment, and the number of civilian automobiles owned on carbon emissions. The study found that there is an inverted U-shaped curve relationship between GDP per capita and carbon emissions, car ownership is not a major driver of carbon emissions, and the development of primary electricity has significant potential as a means for reducing carbon emissions in Jiangxi Province. If strict environmental protection measures are implemented, Jiangxi Province can reach its peak carbon target by 2029, one year ahead of the national target. These findings provide valuable insights for policymakers in Jiangxi Province to ensure that their environmental objectives are met. Full article
(This article belongs to the Special Issue Air Pollution in China (3rd Edition))
Show Figures

Figure 1

20 pages, 36293 KiB  
Article
ICTH: Local-to-Global Spectral Reconstruction Network for Heterosource Hyperspectral Images
by Haozhe Zhou, Zhanhao Liu, Zhenpu Huang, Xuguang Wang, Wen Su and Yanchao Zhang
Remote Sens. 2024, 16(18), 3377; https://doi.org/10.3390/rs16183377 - 11 Sep 2024
Viewed by 219
Abstract
To address the high cost associated with acquiring hyperspectral data, spectral reconstruction (SR) has emerged as a prominent research area. However, contemporary SR techniques are more focused on image processing tasks in computer vision than on practical applications. Furthermore, the prevalent approach of [...] Read more.
To address the high cost associated with acquiring hyperspectral data, spectral reconstruction (SR) has emerged as a prominent research area. However, contemporary SR techniques are more focused on image processing tasks in computer vision than on practical applications. Furthermore, the prevalent approach of employing single-dimensional features to guide reconstruction, aimed at reducing computational overhead, invariably compromises reconstruction accuracy, particularly in complex environments with intricate ground features and severe spectral mixing. Effectively utilizing both local and global information in spatial and spectral dimensions for spectral reconstruction remains a significant challenge. To tackle these challenges, this study proposes an integrated network of 3D CNN and U-shaped Transformer for heterogeneous spectral reconstruction, ICTH, which comprises a shallow feature extraction module (CSSM) and a deep feature extraction module (TDEM), implementing a coarse-to-fine spectral reconstruction scheme. To minimize information loss, we designed a novel spatial–spectral attention module (S2AM) as the foundation for constructing a U-transformer, enhancing the capture of long-range information across all dimensions. On three hyperspectral datasets, ICTH has exhibited remarkable strengths across quantitative, qualitative, and single-band detail assessments, while also revealing significant potential for subsequent applications, such as generalizability and vegetation index calculations) in two real-world datasets. Full article
(This article belongs to the Special Issue Geospatial Artificial Intelligence (GeoAI) in Remote Sensing)
Show Figures

Figure 1

17 pages, 110874 KiB  
Article
RT-CBAM: Refined Transformer Combined with Convolutional Block Attention Module for Underwater Image Restoration
by Renchuan Ye, Yuqiang Qian and Xinming Huang
Sensors 2024, 24(18), 5893; https://doi.org/10.3390/s24185893 - 11 Sep 2024
Viewed by 180
Abstract
Recently, transformers have demonstrated notable improvements in natural advanced visual tasks. In the field of computer vision, transformer networks are beginning to supplant conventional convolutional neural networks (CNNs) due to their global receptive field and adaptability. Although transformers excel in capturing global features, [...] Read more.
Recently, transformers have demonstrated notable improvements in natural advanced visual tasks. In the field of computer vision, transformer networks are beginning to supplant conventional convolutional neural networks (CNNs) due to their global receptive field and adaptability. Although transformers excel in capturing global features, they lag behind CNNs in handling fine local features, especially when dealing with underwater images containing complex and delicate structures. In order to tackle this challenge, we propose a refined transformer model by improving the feature blocks (dilated transformer block) to more accurately compute attention weights, enhancing the capture of both local and global features. Subsequently, a self-supervised method (a local and global blind-patch network) is embedded in the bottleneck layer, which can aggregate local and global information to enhance detail recovery and improve texture restoration quality. Additionally, we introduce a multi-scale convolutional block attention module (MSCBAM) to connect encoder and decoder features; this module enhances the feature representation of color channels, aiding in the restoration of color information in images. We plan to deploy this deep learning model onto the sensors of underwater robots for real-world underwater image-processing and ocean exploration tasks. Our model is named the refined transformer combined with convolutional block attention module (RT-CBAM). This study compares two traditional methods and six deep learning methods, and our approach achieved the best results in terms of detail processing and color restoration. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 11097 KiB  
Article
Multimodal Framework for Fine and Gross Upper-Limb Motor Coordination Assessment Using Serious Games and Robotics
by Edwin Daniel Oña, Norali Pernalete and Alberto Jardón
Appl. Sci. 2024, 14(18), 8175; https://doi.org/10.3390/app14188175 (registering DOI) - 11 Sep 2024
Viewed by 304
Abstract
A critical element of neurological function is eye–hand coordination: the ability of our vision system to coordinate the information received through the eyes to control, guide, and direct the hands to accomplish a task. Recent evidence shows that this ability can be disturbed [...] Read more.
A critical element of neurological function is eye–hand coordination: the ability of our vision system to coordinate the information received through the eyes to control, guide, and direct the hands to accomplish a task. Recent evidence shows that this ability can be disturbed by strokes or other neurological disorders, with critical consequences for motor behaviour. This paper presents a system based on serious games and multimodal devices aimed at improving the assessment of eye–hand coordination. The system implements gameplay that involves drawing specific patterns (labyrinths) to capture hand trajectories. The user can draw the path using multimodal devices such as a mouse, a stylus with a tablet, or robotic devices. Multimodal input devices can allow for the evaluation of complex coordinated movements of the upper limb that involve the synergistic motion of arm joints, depending on the device. A preliminary test of technological validation with healthy volunteers was conducted in the laboratory. The Dynamic Time Warping (DTW) index was used to compare hand trajectories without considering time-series lag. The results suggest that this multimodal framework allows for measuring differences between fine and gross motor skills. Moreover, the results support the viability of this system for developing a high-resolution metric for measuring eye–hand coordination in neurorehabilitation. Full article
(This article belongs to the Special Issue Robotics, IoT and AI Technologies in Bioengineering)
Show Figures

Figure 1

1 pages, 132 KiB  
Correction
Correction: Muniyappan et al. Stability and Numerical Solutions of Second Wave Mathematical Modeling on COVID-19 and Omicron Outbreak Strategy of Pandemic: Analytical and Error Analysis of Approximate Series Solutions by Using HPM. Mathematics 2022, 10, 343
by Ashwin Muniyappan, Balamuralitharan Sundarappan, Poongodi Manoharan, Mounir Hamdi, Kaamran Raahemifar, Sami Bourouis and Vijayakumar Varadarajan
Mathematics 2024, 12(18), 2816; https://doi.org/10.3390/math12182816 - 11 Sep 2024
Viewed by 128
Abstract
In the original publication [...] Full article
(This article belongs to the Section Mathematical Biology)
16 pages, 12121 KiB  
Article
Hardware-in-the-Loop Simulation of Flywheel Energy Storage Systems for Power Control in Wind Farms
by Li Yang and Qiaoni Zhao
Electronics 2024, 13(18), 3610; https://doi.org/10.3390/electronics13183610 - 11 Sep 2024
Viewed by 193
Abstract
Flywheel energy storage systems (FESSs) are widely used for power regulation in wind farms as they can balance the wind farms’ output power and improve the wind power grid connection rate. Due to the complex environment of wind farms, it is costly and [...] Read more.
Flywheel energy storage systems (FESSs) are widely used for power regulation in wind farms as they can balance the wind farms’ output power and improve the wind power grid connection rate. Due to the complex environment of wind farms, it is costly and time-consuming to repeatedly debug the system on-site. To save research costs and shorten research cycles, a hardware-in-the-loop (HIL) testing system was built to provide a convenient testing environment for the research of FESSs on wind farms. The focus of this study is the construction of mathematical models in the HIL testing system. Firstly, a mathematical model of the FESS main circuit is established using a hierarchical method. Secondly, the principle of the permanent magnet synchronous motor (PMSM) is analyzed, and a nonlinear dq mathematical model of the PMSM is established by referring to the relationship among d-axis inductance, q-axis inductance, and permanent magnet flux change with respect to the motor’s current. Then, the power grid and wind farm test models are established. Finally, the established mathematical models are applied to the HIL testing system. The experimental results indicated that the HIL testing system can provide a convenient testing environment for the optimization of FESS control algorithms. Full article
Show Figures

Figure 1

17 pages, 1119 KiB  
Review
Clinical and Ocular Inflammatory Inhibitors of Viral-Based Gene Therapy of the Retina
by Marc Ohlhausen and Christopher D. Conrady
Acta Microbiol. Hell. 2024, 69(3), 187-203; https://doi.org/10.3390/amh69030018 - 11 Sep 2024
Viewed by 190
Abstract
Gene therapy is an emerging field of medicine that can target and treat previously untreatable blinding or lethal diseases. Within the field of ophthalmology, gene therapy has emerged to treat retinal degenerative disorders, but its exact role is in its infancy. While this [...] Read more.
Gene therapy is an emerging field of medicine that can target and treat previously untreatable blinding or lethal diseases. Within the field of ophthalmology, gene therapy has emerged to treat retinal degenerative disorders, but its exact role is in its infancy. While this exciting frontier is rapidly expanding, these typically viral-based gene therapy vectors trigger a host immune response. Thus, a better understanding of the host immune response to gene therapies is critical, in that harnessing immunity to these vectors may improve treatment efficacy and reduce the risk of vision loss from inflammation. As such, we will discuss innate and adaptive immunity to gene therapy vectors, and avenues through which this response may be harnessed to improve visual outcomes. Full article
(This article belongs to the Special Issue Feature Papers in Medical Microbiology in 2024)
Show Figures

Figure 1

26 pages, 1405 KiB  
Review
Image Analysis in Autonomous Vehicles: A Review of the Latest AI Solutions and Their Comparison
by Michał Kozłowski, Szymon Racewicz and Sławomir Wierzbicki
Appl. Sci. 2024, 14(18), 8150; https://doi.org/10.3390/app14188150 - 11 Sep 2024
Viewed by 385
Abstract
The integration of advanced image analysis using artificial intelligence (AI) is pivotal for the evolution of autonomous vehicles (AVs). This article provides a thorough review of the most significant datasets and latest state-of-the-art AI solutions employed in image analysis for AVs. Datasets such [...] Read more.
The integration of advanced image analysis using artificial intelligence (AI) is pivotal for the evolution of autonomous vehicles (AVs). This article provides a thorough review of the most significant datasets and latest state-of-the-art AI solutions employed in image analysis for AVs. Datasets such as Cityscapes, NuScenes, CARLA, and Talk2Car form the benchmarks for training and evaluating different AI models, with unique characteristics catering to various aspects of autonomous driving. Key AI methodologies, including Convolutional Neural Networks (CNNs), Transformer models, Generative Adversarial Networks (GANs), and Vision Language Models (VLMs), are discussed. The article also presents a comparative analysis of various AI techniques in real-world scenarios, focusing on semantic image segmentation, 3D object detection, vehicle control in virtual environments, and vehicle interaction using natural language. Simultaneously, the roles of multisensor datasets and simulation platforms like AirSim, TORCS, and SUMMIT in enriching the training data and testing environments for AVs are highlighted. By synthesizing information on datasets, AI solutions, and comparative performance evaluations, this article serves as a crucial resource for researchers, developers, and industry stakeholders, offering a clear view of the current landscape and future directions in autonomous vehicle image analysis technologies. Full article
(This article belongs to the Special Issue Future Autonomous Vehicles and Their Systems)
Show Figures

Figure 1

20 pages, 3519 KiB  
Article
The Implementation of Multimodal Large Language Models for Hydrological Applications: A Comparative Study of GPT-4 Vision, Gemini, LLaVa, and Multimodal-GPT
by Likith Anoop Kadiyala, Omer Mermer, Dinesh Jackson Samuel, Yusuf Sermet and Ibrahim Demir
Hydrology 2024, 11(9), 148; https://doi.org/10.3390/hydrology11090148 - 11 Sep 2024
Viewed by 343
Abstract
Large Language Models (LLMs) combined with visual foundation models have demonstrated significant advancements, achieving intelligence levels comparable to human capabilities. This study analyzes the latest Multimodal LLMs (MLLMs), including Multimodal-GPT, GPT-4 Vision, Gemini, and LLaVa, with a focus on hydrological applications such as [...] Read more.
Large Language Models (LLMs) combined with visual foundation models have demonstrated significant advancements, achieving intelligence levels comparable to human capabilities. This study analyzes the latest Multimodal LLMs (MLLMs), including Multimodal-GPT, GPT-4 Vision, Gemini, and LLaVa, with a focus on hydrological applications such as flood management, water level monitoring, agricultural water discharge, and water pollution management. We evaluated these MLLMs on hydrology-specific tasks, testing their response generation and real-time suitability in complex real-world scenarios. Prompts were designed to enhance the models’ visual inference capabilities and contextual comprehension from images. Our findings reveal that GPT-4 Vision demonstrated exceptional proficiency in interpreting visual data, providing accurate assessments of flood severity and water quality. Additionally, MLLMs showed potential in various hydrological applications, including drought prediction, streamflow forecasting, groundwater management, and wetland conservation. These models can optimize water resource management by predicting rainfall, evaporation rates, and soil moisture levels, thereby promoting sustainable agricultural practices. This research provides valuable insights into the potential applications of advanced AI models in addressing complex hydrological challenges and improving real-time decision-making in water resource management Full article
Show Figures

Figure 1

10 pages, 722 KiB  
Article
WaterSAM: Adapting SAM for Underwater Object Segmentation
by Yang Hong, Xiaowei Zhou, Ruzhuang Hua, Qingxuan Lv and Junyu Dong
J. Mar. Sci. Eng. 2024, 12(9), 1616; https://doi.org/10.3390/jmse12091616 - 11 Sep 2024
Viewed by 187
Abstract
Object segmentation, a key type of image segmentation, focuses on detecting and delineating individual objects within an image, essential for applications like robotic vision and augmented reality. Despite advancements in deep learning improving object segmentation, underwater object segmentation remains challenging due to unique [...] Read more.
Object segmentation, a key type of image segmentation, focuses on detecting and delineating individual objects within an image, essential for applications like robotic vision and augmented reality. Despite advancements in deep learning improving object segmentation, underwater object segmentation remains challenging due to unique underwater complexities such as turbulence diffusion, light absorption, noise, low contrast, uneven illumination, and intricate backgrounds. The scarcity of underwater datasets further complicates these challenges. The Segment Anything Model (SAM) has shown potential in addressing these issues, but its adaptation for underwater environments, AquaSAM, requires fine-tuning all parameters, demanding more labeled data and high computational costs. In this paper, we propose WaterSAM, an adapted model for underwater object segmentation. Inspired by Low-Rank Adaptation (LoRA), WaterSAM incorporates trainable rank decomposition matrices into the Transformer’s layers, specifically enhancing the image encoder. This approach significantly reduces the number of trainable parameters to 6.7% of SAM’s parameters, lowering computational costs. We validated WaterSAM on three underwater image datasets: COD10K, SUIM, and UIIS. Results demonstrate that WaterSAM significantly outperforms pre-trained SAM in underwater segmentation tasks, contributing to advancements in marine biology, underwater archaeology, and environmental monitoring. Full article
Show Figures

Figure 1

Back to TopTop