Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (285)

Search Parameters:
Keywords = embedded computer vision

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 6878 KiB  
Article
Microscopic Insect Pest Detection in Tea Plantations: Improved YOLOv8 Model Based on Deep Learning
by Zejun Wang, Shihao Zhang, Lijiao Chen, Wendou Wu, Houqiao Wang, Xiaohui Liu, Zongpei Fan and Baijuan Wang
Agriculture 2024, 14(10), 1739; https://doi.org/10.3390/agriculture14101739 - 2 Oct 2024
Viewed by 382
Abstract
Pest infestations in tea gardens are one of the common issues encountered during tea cultivation. This study introduces an improved YOLOv8 network model for the detection of tea pests to facilitate the rapid and accurate identification of early-stage micro-pests, addressing challenges such as [...] Read more.
Pest infestations in tea gardens are one of the common issues encountered during tea cultivation. This study introduces an improved YOLOv8 network model for the detection of tea pests to facilitate the rapid and accurate identification of early-stage micro-pests, addressing challenges such as small datasets and the difficulty of extracting phenotypic features of target pests in tea pest detection. Based on the original YOLOv8 network framework, this study adopts the SIoU optimized loss function to enhance the model’s learning ability for pest samples. AKConv is introduced to replace certain network structures, enhancing feature extraction capabilities and reducing the number of model parameters. Vision Transformer with Bi-Level Routing Attention is embedded to provide the model with a more flexible computation allocation and improve its ability to capture target position information. Experimental results show that the improved YOLOv8 network achieves a detection accuracy of 98.16% for tea pest detection, which is a 2.62% improvement over the original YOLOv8 network. Compared with the YOLOv10, YOLOv9, YOLOv7, Faster RCNN, and SSD models, the improved YOLOv8 network has increased the mAP value by 3.12%, 4.34%, 5.44%, 16.54%, and 11.29%, respectively, enabling fast and accurate identification of early-stage micro pests in tea gardens. This study proposes an improved YOLOv8 network model based on deep learning for the detection of micro-pests in tea, providing a viable research method and significant reference for addressing the identification of micro-pests in tea. It offers an effective pathway for the high-quality development of Yunnan’s ecological tea industry and ensures the healthy growth of the tea industry. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

18 pages, 9000 KiB  
Article
Multilevel Geometric Feature Embedding in Transformer Network for ALS Point Cloud Semantic Segmentation
by Zhuanxin Liang and Xudong Lai
Remote Sens. 2024, 16(18), 3386; https://doi.org/10.3390/rs16183386 - 12 Sep 2024
Viewed by 421
Abstract
Effective semantic segmentation of Airborne Laser Scanning (ALS) point clouds is a crucial field of study and influences subsequent point cloud application tasks. Transformer networks have made significant progress in 2D/3D computer vision tasks, exhibiting superior performance. We propose a multilevel geometric feature [...] Read more.
Effective semantic segmentation of Airborne Laser Scanning (ALS) point clouds is a crucial field of study and influences subsequent point cloud application tasks. Transformer networks have made significant progress in 2D/3D computer vision tasks, exhibiting superior performance. We propose a multilevel geometric feature embedding transformer network (MGFE-T), which aims to fully utilize the three-dimensional structural information carried by point clouds and enhance transformer performance in ALS point cloud semantic segmentation. In the encoding stage, compute the geometric features surrounding tee sampling points at each layer and embed them into the transformer workflow. To ensure that the receptive field of the self-attention mechanism and the geometric computation domain can maintain a consistent scale at each layer, we propose a fixed-radius dilated KNN (FR-DKNN) search method to address the limitation of traditional KNN search methods in considering domain radius. In the decoding stage, we aggregate prediction deviations at each level into a unified loss value, enabling multilevel supervision to improve the network’s feature learning ability at different levels. The MGFE-T network can predict the class label of each point in an end-to-end manner. Experiments were conducted on three widely used benchmark datasets. The results indicate that the MGFE-T network achieves superior OA and mF1 scores on the LASDU and DFC2019 datasets and performs well on the ISPRS dataset with imbalanced classes. Full article
Show Figures

Figure 1

17 pages, 110874 KiB  
Article
RT-CBAM: Refined Transformer Combined with Convolutional Block Attention Module for Underwater Image Restoration
by Renchuan Ye, Yuqiang Qian and Xinming Huang
Sensors 2024, 24(18), 5893; https://doi.org/10.3390/s24185893 - 11 Sep 2024
Viewed by 422
Abstract
Recently, transformers have demonstrated notable improvements in natural advanced visual tasks. In the field of computer vision, transformer networks are beginning to supplant conventional convolutional neural networks (CNNs) due to their global receptive field and adaptability. Although transformers excel in capturing global features, [...] Read more.
Recently, transformers have demonstrated notable improvements in natural advanced visual tasks. In the field of computer vision, transformer networks are beginning to supplant conventional convolutional neural networks (CNNs) due to their global receptive field and adaptability. Although transformers excel in capturing global features, they lag behind CNNs in handling fine local features, especially when dealing with underwater images containing complex and delicate structures. In order to tackle this challenge, we propose a refined transformer model by improving the feature blocks (dilated transformer block) to more accurately compute attention weights, enhancing the capture of both local and global features. Subsequently, a self-supervised method (a local and global blind-patch network) is embedded in the bottleneck layer, which can aggregate local and global information to enhance detail recovery and improve texture restoration quality. Additionally, we introduce a multi-scale convolutional block attention module (MSCBAM) to connect encoder and decoder features; this module enhances the feature representation of color channels, aiding in the restoration of color information in images. We plan to deploy this deep learning model onto the sensors of underwater robots for real-world underwater image-processing and ocean exploration tasks. Our model is named the refined transformer combined with convolutional block attention module (RT-CBAM). This study compares two traditional methods and six deep learning methods, and our approach achieved the best results in terms of detail processing and color restoration. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

19 pages, 5685 KiB  
Article
HRA-YOLO: An Effective Detection Model for Underwater Fish
by Hongru Wang, Jingtao Zhang and Hu Cheng
Electronics 2024, 13(17), 3547; https://doi.org/10.3390/electronics13173547 - 6 Sep 2024
Viewed by 419
Abstract
In intelligent fisheries, accurate fish detection is essential to monitor underwater ecosystems. By utilizing underwater cameras and computer vision technologies to detect fish distribution, timely feedback can be provided to staff, enabling effective fishery management. This paper proposes a lightweight underwater fish detection [...] Read more.
In intelligent fisheries, accurate fish detection is essential to monitor underwater ecosystems. By utilizing underwater cameras and computer vision technologies to detect fish distribution, timely feedback can be provided to staff, enabling effective fishery management. This paper proposes a lightweight underwater fish detection algorithm based on YOLOv8s, named HRA-YOLO, to meet the demand for a high-precision and lightweight object detection algorithm. Firstly, the lightweight network High-Performance GPU Net (HGNetV2) is used to substitute the backbone network of the YOLOv8s model to lower the computational cost and reduce the size of the model. Second, to enhance the capability of extracting fish feature information and reducing missed detections, we design a residual attention (RA) module, which is formulated by embedding the efficient multiscale attention (EMA) mechanism at the end of the Dilation-Wise Residual (DWR) module. Then, we adopt the RA module to replace the bottleneck of the YOLOv8s model to increase detection precision. Taking universality into account, we establish an underwater fish dataset for our subsequent experiments by collecting data in various waters. Comprehensive experiments are carried out on the self-constructed fish dataset. The results on the self-constructed dataset demonstrate that the precision of the HRA-YOLO model improved to 93.1%, surpassing the original YOLOv8s model, while the computational complexity was reduced by 19% (5.4 GFLOPs), and the model size was decreased by 25.3% (5.7 MB). And compared to other state-of-the-art detection models, the overall performance of our model shows its superiority. We also perform experiments on other datasets to verify the adaptability of our model. The experimental results on the Fish Market dataset indicate that our model has better overall performance than the original model and has good generality. Full article
Show Figures

Figure 1

18 pages, 66715 KiB  
Article
Vehicle Ego-Trajectory Segmentation Using Guidance Cues
by Andrei Mihalea and Adina Magda Florea
Appl. Sci. 2024, 14(17), 7776; https://doi.org/10.3390/app14177776 - 3 Sep 2024
Viewed by 523
Abstract
Computer vision has significantly influenced recent advancements in autonomous driving by providing cutting-edge solutions for various challenges, including object detection, semantic segmentation, and comprehensive scene understanding. One specific challenge is ego-vehicle trajectory segmentation, which involves learning the vehicle’s path and describing it with [...] Read more.
Computer vision has significantly influenced recent advancements in autonomous driving by providing cutting-edge solutions for various challenges, including object detection, semantic segmentation, and comprehensive scene understanding. One specific challenge is ego-vehicle trajectory segmentation, which involves learning the vehicle’s path and describing it with a segmentation map. This can play an important role in both autonomous driving and advanced driver assistance systems, as it enhances the accuracy of perceiving and forecasting the vehicle’s movements across different driving scenarios. In this work, we propose a deep learning approach for ego-trajectory segmentation that leverages a state-of-the-art segmentation network augmented with guidance cues provided through various merging mechanisms. These mechanisms are designed to direct the vehicle’s path as intended, utilizing training data obtained with a self-supervised approach. Our results demonstrate the feasibility of using self-supervised labels for ego-trajectory segmentation and embedding directional intentions within the network’s decisions through image and guidance input concatenation, feature concatenation, or cross-attention between pixel features and various types of guidance cues. We also analyze the effectiveness of our approach in constraining the segmentation outputs and prove that our proposed improvements bring major boosts in the segmentation metrics, increasing IoU by more than 12% and 5% compared with our two baseline models. This work paves the way for further exploration into ego-trajectory segmentation methods aimed at better predicting the behavior of autonomous vehicles. Full article
(This article belongs to the Special Issue Intelligent Transportation System Technologies and Applications)
Show Figures

Figure 1

15 pages, 3876 KiB  
Article
Knowledge Distillation for Enhanced Age and Gender Prediction Accuracy
by Seunghyun Kim, Yeongje Park and Eui Chul Lee
Mathematics 2024, 12(17), 2647; https://doi.org/10.3390/math12172647 - 26 Aug 2024
Viewed by 540
Abstract
In recent years, the ability to accurately predict age and gender from facial images has gained significant traction across various fields such as personalized marketing, human–computer interaction, and security surveillance. However, the high computational cost of the current models limits their practicality for [...] Read more.
In recent years, the ability to accurately predict age and gender from facial images has gained significant traction across various fields such as personalized marketing, human–computer interaction, and security surveillance. However, the high computational cost of the current models limits their practicality for real-time applications on resource-constrained devices. This study addressed this challenge by leveraging knowledge distillation to develop lightweight age and gender prediction models that maintain a high accuracy. We propose a knowledge distillation method using teacher bounds for the efficient learning of small models for age and gender. This method allows the student model to selectively receive the teacher model’s knowledge, preventing it from unconditionally learning from the teacher in challenging age/gender prediction tasks involving factors like illusions and makeup. Our experiments used MobileNetV3 and EfficientFormer as the student models and Vision Outlooker (VOLO)-D1 as the teacher model, resulting in substantial efficiency improvements. MobileNetV3-Small, one of the student models we experimented with, achieved a 94.27% reduction in parameters and a 99.17% reduction in Giga Floating Point Operations per Second (GFLOPs). Furthermore, the distilled MobileNetV3-Small model improved gender prediction accuracy from 88.11% to 90.78%. Our findings confirm that knowledge distillation can effectively enhance model performance across diverse demographic groups while ensuring efficiency for deployment on embedded devices. This research advances the development of practical, high-performance AI applications in resource-limited environments. Full article
Show Figures

Figure 1

18 pages, 9357 KiB  
Article
Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model
by Yaning Kong, Xiangfeng Shang and Shijie Jia
Sensors 2024, 24(17), 5496; https://doi.org/10.3390/s24175496 - 24 Aug 2024
Viewed by 1039
Abstract
Performing low-latency, high-precision object detection on unmanned aerial vehicles (UAVs) equipped with vision sensors holds significant importance. However, the current limitations of embedded UAV devices present challenges in balancing accuracy and speed, particularly in the analysis of high-precision remote sensing images. This challenge [...] Read more.
Performing low-latency, high-precision object detection on unmanned aerial vehicles (UAVs) equipped with vision sensors holds significant importance. However, the current limitations of embedded UAV devices present challenges in balancing accuracy and speed, particularly in the analysis of high-precision remote sensing images. This challenge is particularly pronounced in scenarios involving numerous small objects, intricate backgrounds, and occluded overlaps. To address these issues, we introduce the Drone-DETR model, which is based on RT-DETR. To overcome the difficulties associated with detecting small objects and reducing redundant computations arising from complex backgrounds in ultra-wide-angle images, we propose the Effective Small Object Detection Network (ESDNet). This network preserves detailed information about small objects, reduces redundant computations, and adopts a lightweight architecture. Furthermore, we introduce the Enhanced Dual-Path Feature Fusion Attention Module (EDF-FAM) within the neck network. This module is specifically designed to enhance the network’s ability to handle multi-scale objects. We employ a dynamic competitive learning strategy to enhance the model’s capability to efficiently fuse multi-scale features. Additionally, we incorporate the P2 shallow feature layer from the ESDNet into the neck network to enhance the model’s ability to fuse small-object features, thereby enhancing the accuracy of small object detection. Experimental results indicate that the Drone-DETR model achieves an mAP50 of 53.9% with only 28.7 million parameters on the VisDrone2019 dataset, representing an 8.1% enhancement over RT-DETR-R18. Full article
Show Figures

Figure 1

39 pages, 2593 KiB  
Review
From Near-Sensor to In-Sensor: A State-of-the-Art Review of Embedded AI Vision Systems
by William Fabre, Karim Haroun, Vincent Lorrain, Maria Lepecq and Gilles Sicard
Sensors 2024, 24(16), 5446; https://doi.org/10.3390/s24165446 - 22 Aug 2024
Viewed by 1066
Abstract
In modern cyber-physical systems, the integration of AI into vision pipelines is now a standard practice for applications ranging from autonomous vehicles to mobile devices. Traditional AI integration often relies on cloud-based processing, which faces challenges such as data access bottlenecks, increased latency, [...] Read more.
In modern cyber-physical systems, the integration of AI into vision pipelines is now a standard practice for applications ranging from autonomous vehicles to mobile devices. Traditional AI integration often relies on cloud-based processing, which faces challenges such as data access bottlenecks, increased latency, and high power consumption. This article reviews embedded AI vision systems, examining the diverse landscape of near-sensor and in-sensor processing architectures that incorporate convolutional neural networks. We begin with a comprehensive analysis of the critical characteristics and metrics that define the performance of AI-integrated vision systems. These include sensor resolution, frame rate, data bandwidth, computational throughput, latency, power efficiency, and overall system scalability. Understanding these metrics provides a foundation for evaluating how different embedded processing architectures impact the entire vision pipeline, from image capture to AI inference. Our analysis delves into near-sensor systems that leverage dedicated hardware accelerators and commercially available components to efficiently process data close to their source, minimizing data transfer overhead and latency. These systems offer a balance between flexibility and performance, allowing for real-time processing in constrained environments. In addition, we explore in-sensor processing solutions that integrate computational capabilities directly into the sensor. This approach addresses the rigorous demand constraints of embedded applications by significantly reducing data movement and power consumption while also enabling in-sensor feature extraction, pre-processing, and CNN inference. By comparing these approaches, we identify trade-offs related to flexibility, power consumption, and computational performance. Ultimately, this article provides insights into the evolving landscape of embedded AI vision systems and suggests new research directions for the development of next-generation machine vision systems. Full article
(This article belongs to the Special Issue Sensor Technology for Intelligent Control and Computer Visions)
Show Figures

Figure 1

18 pages, 27583 KiB  
Article
A Contrastive Learning Based Multiview Scene Matching Method for UAV View Geo-Localization
by Qiyi He, Ao Xu, Yifan Zhang, Zhiwei Ye, Wen Zhou, Ruijie Xi and Qiao Lin
Remote Sens. 2024, 16(16), 3039; https://doi.org/10.3390/rs16163039 - 19 Aug 2024
Viewed by 555
Abstract
Multi-view scene matching refers to the establishment of a mapping relationship between images captured from different perspectives, such as those taken by unmanned aerial vehicles (UAVs) and satellites. This technology is crucial for the geo-localization of UAV views. However, the geometric discrepancies between [...] Read more.
Multi-view scene matching refers to the establishment of a mapping relationship between images captured from different perspectives, such as those taken by unmanned aerial vehicles (UAVs) and satellites. This technology is crucial for the geo-localization of UAV views. However, the geometric discrepancies between images from different perspectives, combined with the inherent computational constraints of UAVs, present significant challenges for matching UAV and satellite images. Additionally, the imbalance of positive and negative samples between drone and satellite images during model training can lead to instability. To address these challenges, this study proposes a novel and efficient cross-view geo-localization framework called MSM-Transformer. The framework employs the Dual Attention Vision Transformer (DaViT) as the core architecture for feature extraction, which significantly enhances the modeling capacity for global features and the contextual relevance of adjacent regions. The weight-sharing mechanism in MSM-Transformer effectively reduces model complexity, making it highly suitable for deployment on embedded devices such as UAVs and satellites. Furthermore, the framework introduces a contrastive learning-based Symmetric Decoupled Contrastive Learning (DCL) loss function, which effectively mitigates the issue of sample imbalance between satellite and UAV images. Experimental validation on the University-1652 dataset demonstrates that MSM-Transformer achieves outstanding performance, delivering optimal matching results with a minimal number of parameters. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

18 pages, 7285 KiB  
Article
A Real-Time Intelligent Valve Monitoring Approach through Cameras Based on Computer Vision Methods
by Zihui Zhang, Qiyuan Zhou, Heping Jin, Qian Li and Yiyang Dai
Sensors 2024, 24(16), 5337; https://doi.org/10.3390/s24165337 - 18 Aug 2024
Viewed by 551
Abstract
Abnormal valve positions can lead to fluctuations in the process industry, potentially triggering serious accidents. For processes that frequently require operational switching, such as green chemical processes based on renewable energy or biotechnological fermentation processes, this issue becomes even more severe. Despite this [...] Read more.
Abnormal valve positions can lead to fluctuations in the process industry, potentially triggering serious accidents. For processes that frequently require operational switching, such as green chemical processes based on renewable energy or biotechnological fermentation processes, this issue becomes even more severe. Despite this risk, many plants still rely on manual inspections to check valve status. The widespread use of cameras in large plants now makes it feasible to monitor valve positions through computer vision technology. This paper proposes a novel real-time valve monitoring approach based on computer vision to detect abnormalities in valve positions. Utilizing an improved network architecture based on YOLO V8, the method performs valve detection and feature recognition. To address the challenge of small, relatively fixed-position valves in the images, a coord attention module is introduced, embedding position information into the feature channels and enhancing the accuracy of valve rotation feature extraction. The valve position is then calculated using a rotation algorithm with the valve’s center point and bounding box coordinates, triggering an alarm for valves that exceed a pre-set threshold. The accuracy and generalization ability of the proposed approach are evaluated through experiments on three different types of valves in two industrial scenarios. The results demonstrate that the method meets the accuracy and robustness standards required for real-time valve monitoring in industrial applications. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

19 pages, 2620 KiB  
Article
Research on the Application of Pruning Algorithm Based on Local Linear Embedding Method in Traffic Sign Recognition
by Wei Wang and Xiaorui Liu
Appl. Sci. 2024, 14(16), 7184; https://doi.org/10.3390/app14167184 - 15 Aug 2024
Viewed by 471
Abstract
Efficient traffic sign recognition is crucial to facilitating the intelligent driving of new energy vehicles. However, current approaches like the Vision Transformer (ViT) model often impose high storage and computational demands, escalating hardware costs. This paper presents a similarity filter pruning method based [...] Read more.
Efficient traffic sign recognition is crucial to facilitating the intelligent driving of new energy vehicles. However, current approaches like the Vision Transformer (ViT) model often impose high storage and computational demands, escalating hardware costs. This paper presents a similarity filter pruning method based on locally linear embedding. Using the alternating direction multiplier method and the loss of the locally linear embedding method for the model training function, the proposed pruning method prunes the operation model mainly by evaluating the similarity of each layer in the network layer filters. According to the pre-set pruning threshold value, similar filters to be pruned are obtained, and the filter with a large cross-entropy value is retained. The results from the Belgium Traffic Sign (BelgiumTS) and German Traffic Sign Recognition Benchmark (GTSRB) datasets indicate that the proposed similarity filter pruning based on local linear embedding (SJ-LLE) pruning algorithm can reduce the number of parameters of the multi-head self-attention module and Multi-layer Perceptron (MLP) module of the ViT model by more than 60%, and the loss of model accuracy is acceptable. The scale of the ViT model is greatly reduced, which is conducive to applying this model in embedded traffic sign recognition equipment. Also, this paper proves the hypothesis through experiments that “using the LLE algorithm as the loss function for model training before pruning plays a positive role in reducing the loss of model performance in the pruning process”. Full article
(This article belongs to the Special Issue Optimization and Simulation Techniques for Transportation)
Show Figures

Figure 1

26 pages, 9063 KiB  
Article
Forearm Intravenous Detection and Localization for Autonomous Vein Injection Using Contrast-Limited Adaptive Histogram Equalization Algorithm
by Hany Said, Sherif Mohamed, Omar Shalash, Esraa Khatab, Omar Aman, Ramy Shaaban and Mohamed Hesham
Appl. Sci. 2024, 14(16), 7115; https://doi.org/10.3390/app14167115 - 13 Aug 2024
Viewed by 1166
Abstract
Occasionally intravenous insertion forms a challenge to a number of patients. Inserting an IV needle is a difficult task that requires a lot of skill. At the moment, only doctors and medical personnel are allowed to do this because it requires finding the [...] Read more.
Occasionally intravenous insertion forms a challenge to a number of patients. Inserting an IV needle is a difficult task that requires a lot of skill. At the moment, only doctors and medical personnel are allowed to do this because it requires finding the right vein, inserting the needle properly, and carefully injecting fluids or drawing out blood. Even for trained professionals, this can be done incorrectly, which can cause bleeding, infection, or damage to the vein. It is especially difficult to do this on children, elderly people, and people with certain skin conditions. In these cases, the veins are harder to see, so it is less likely to be done correctly the first time and may cause blood clots. In this research, a low-cost embedded system utilizing Near-Infrared (NIR) light technology is developed, and two novel approaches are proposed to detect and select the best candidate veins. The two approaches utilize multiple computer vision tools and are based on contrast-limited adaptive histogram equalization (CLAHE). The accuracy of the proposed algorithm is 91.3% with an average 1.4 s processing time on Raspberry Pi 4 Model B. Full article
Show Figures

Figure 1

23 pages, 14538 KiB  
Article
Rep-ViG-Apple: A CNN-GCN Hybrid Model for Apple Detection in Complex Orchard Environments
by Bo Han, Ziao Lu, Jingjing Zhang, Rolla Almodfer, Zhengting Wang, Wei Sun and Luan Dong
Agronomy 2024, 14(8), 1733; https://doi.org/10.3390/agronomy14081733 - 7 Aug 2024
Viewed by 683
Abstract
Accurately recognizing apples in complex environments is essential for automating apple picking operations, particularly under challenging natural conditions such as cloudy, snowy, foggy, and rainy weather, as well as low-light situations. To overcome the challenges of reduced apple target detection accuracy due to [...] Read more.
Accurately recognizing apples in complex environments is essential for automating apple picking operations, particularly under challenging natural conditions such as cloudy, snowy, foggy, and rainy weather, as well as low-light situations. To overcome the challenges of reduced apple target detection accuracy due to branch occlusion, apple overlap, and variations between near and far field scales, we propose the Rep-ViG-Apple algorithm, an advanced version of the YOLO model. The Rep-ViG-Apple algorithm features a sophisticated architecture designed to enhance apple detection performance in difficult conditions. To improve feature extraction for occluded and overlapped apple targets, we developed the inverted residual multi-scale structural reparameterized feature extraction block (RepIRD Block) within the backbone network. We also integrated the sparse graph attention mechanism (SVGA) to capture global feature information, concentrate attention on apples, and reduce interference from complex environmental features. Moreover, we designed a feature extraction network with a CNN-GCN architecture, termed Rep-Vision-GCN. This network combines the local multi-scale feature extraction capabilities of a convolutional neural network (CNN) with the global modeling strengths of a graph convolutional network (GCN), enhancing the extraction of apple features. The RepConvsBlock module, embedded in the neck network, forms the Rep-FPN-PAN feature fusion network, which improves the recognition of apple targets across various scales, both near and far. Furthermore, we implemented a channel pruning algorithm based on LAMP scores to balance computational efficiency with model accuracy. Experimental results demonstrate that the Rep-ViG-Apple algorithm achieves precision, recall, and average accuracy of 92.5%, 85.0%, and 93.3%, respectively, marking improvements of 1.5%, 1.5%, and 2.0% over YOLOv8n. Additionally, the Rep-ViG-Apple model benefits from a 22% reduction in size, enhancing its efficiency and suitability for deployment in resource-constrained environments while maintaining high accuracy. Full article
Show Figures

Figure 1

32 pages, 30788 KiB  
Article
Illumination and Shadows in Head Rotation: Experiments with Denoising Diffusion Models
by Andrea Asperti, Gabriele Colasuonno and Antonio Guerra
Electronics 2024, 13(15), 3091; https://doi.org/10.3390/electronics13153091 - 5 Aug 2024
Viewed by 744
Abstract
Accurately modeling the effects of illumination and shadows during head rotation is critical in computer vision for enhancing image realism and reducing artifacts. This study delves into the latent space of denoising diffusion models to identify compelling trajectories that can express continuous head [...] Read more.
Accurately modeling the effects of illumination and shadows during head rotation is critical in computer vision for enhancing image realism and reducing artifacts. This study delves into the latent space of denoising diffusion models to identify compelling trajectories that can express continuous head rotation under varying lighting conditions. A key contribution of our work is the generation of additional labels from the CelebA dataset, categorizing images into three groups based on prevalent illumination direction: left, center, and right. These labels play a crucial role in our approach, enabling more precise manipulations and improved handling of lighting variations. Leveraging a recent embedding technique for Denoising Diffusion Implicit Models (DDIM), our method achieves noteworthy manipulations, encompassing a wide rotation angle of ±30°. while preserving individual distinct characteristics even under challenging illumination conditions. Our methodology involves computing trajectories that approximate clouds of latent representations of dataset samples with different yaw rotations through linear regression. Specific trajectories are obtained by analyzing subsets of data that share significant attributes with the source image, including light direction. Notably, our approach does not require any specific training of the generative model for the task of rotation; we merely compute and follow specific trajectories in the latent space of a pre-trained face generation model. This article showcases the potential of our approach and its current limitations through a qualitative discussion of notable examples. This study contributes to the ongoing advancements in representation learning and the semantic investigation of the latent space of generative models. Full article
(This article belongs to the Special Issue Generative AI and Its Transformative Potential)
Show Figures

Figure 1

20 pages, 6281 KiB  
Article
Overlapping Shoeprint Detection by Edge Detection and Deep Learning
by Chengran Li, Ajit Narayanan and Akbar Ghobakhlou
J. Imaging 2024, 10(8), 186; https://doi.org/10.3390/jimaging10080186 - 31 Jul 2024
Viewed by 1020
Abstract
In the field of 2-D image processing and computer vision, accurately detecting and segmenting objects in scenarios where they overlap or are obscured remains a challenge. This difficulty is worse in the analysis of shoeprints used in forensic investigations because they are embedded [...] Read more.
In the field of 2-D image processing and computer vision, accurately detecting and segmenting objects in scenarios where they overlap or are obscured remains a challenge. This difficulty is worse in the analysis of shoeprints used in forensic investigations because they are embedded in noisy environments such as the ground and can be indistinct. Traditional convolutional neural networks (CNNs), despite their success in various image analysis tasks, struggle with accurately delineating overlapping objects due to the complexity of segmenting intertwined textures and boundaries against a background of noise. This study introduces and employs the YOLO (You Only Look Once) model enhanced by edge detection and image segmentation techniques to improve the detection of overlapping shoeprints. By focusing on the critical boundary information between shoeprint textures and the ground, our method demonstrates improvements in sensitivity and precision, achieving confidence levels above 85% for minimally overlapped images and maintaining above 70% for extensively overlapped instances. Heatmaps of convolution layers were generated to show how the network converges towards successful detection using these enhancements. This research may provide a potential methodology for addressing the broader challenge of detecting multiple overlapping objects against noisy backgrounds. Full article
Show Figures

Figure 1

Back to TopTop