Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (715)

Search Parameters:
Keywords = scene recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 20082 KiB  
Article
An Ontology-Based Vehicle Behavior Prediction Method Incorporating Vehicle Light Signal Detection
by Xiaolong Xu, Xiaolin Shi, Yun Chen and Xu Wu
Sensors 2024, 24(19), 6459; https://doi.org/10.3390/s24196459 - 6 Oct 2024
Viewed by 470
Abstract
Although deep learning techniques have potential in vehicle behavior prediction, it is difficult to integrate traffic rules and environmental information. Moreover, its black-box nature leads to an opaque and difficult-to-interpret prediction process, limiting its acceptance in practical applications. In contrast, ontology reasoning, which [...] Read more.
Although deep learning techniques have potential in vehicle behavior prediction, it is difficult to integrate traffic rules and environmental information. Moreover, its black-box nature leads to an opaque and difficult-to-interpret prediction process, limiting its acceptance in practical applications. In contrast, ontology reasoning, which can utilize human domain knowledge and mimic human reasoning, can provide reliable explanations for the speculative results. To address the limitations of the above deep learning methods in the field of vehicle behavior prediction, this paper proposes a front vehicle behavior prediction method that combines deep learning techniques with ontology reasoning. Specifically, YOLOv5s is first selected as the base model for recognizing the brake light status of vehicles. In order to further enhance the performance of the model in complex scenes and small target recognition, the Convolutional Block Attention Module (CBAM) is introduced. In addition, so as to balance the feature information of different scales more efficiently, a weighted bi-directional feature pyramid network (BIFPN) is introduced to replace the original PANet structure in YOLOv5s. Next, using a four-lane intersection as an application scenario, multiple factors affecting vehicle behavior are analyzed. Based on these factors, an ontology model for predicting front vehicle behavior is constructed. Finally, for the purpose of validating the effectiveness of the proposed method, we make our own brake light detection dataset. The accuracy and [email protected] of the improved model on the self-made dataset are 3.9% and 2.5% higher than that of the original model, respectively. Afterwards, representative validation scenarios were selected for inference experiments. The ontology model created in this paper accurately reasoned out the behavior that the target vehicle would slow down until stopping and turning left. The reasonableness and practicality of the front vehicle behavior prediction method constructed in this paper are verified. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

39 pages, 9734 KiB  
Review
A Survey of Robot Intelligence with Large Language Models
by Hyeongyo Jeong, Haechan Lee, Changwon Kim and Sungtae Shin
Appl. Sci. 2024, 14(19), 8868; https://doi.org/10.3390/app14198868 - 2 Oct 2024
Viewed by 668
Abstract
Since the emergence of ChatGPT, research on large language models (LLMs) has actively progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited exceptional abilities in understanding natural language and planning tasks. These abilities of LLMs are promising in robotics. In [...] Read more.
Since the emergence of ChatGPT, research on large language models (LLMs) has actively progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited exceptional abilities in understanding natural language and planning tasks. These abilities of LLMs are promising in robotics. In general, traditional supervised learning-based robot intelligence systems have a significant lack of adaptability to dynamically changing environments. However, LLMs help a robot intelligence system to improve its generalization ability in dynamic and complex real-world environments. Indeed, findings from ongoing robotics studies indicate that LLMs can significantly improve robots’ behavior planning and execution capabilities. Additionally, vision-language models (VLMs), trained on extensive visual and linguistic data for the vision question answering (VQA) problem, excel at integrating computer vision with natural language processing. VLMs can comprehend visual contexts and execute actions through natural language. They also provide descriptions of scenes in natural language. Several studies have explored the enhancement of robot intelligence using multimodal data, including object recognition and description by VLMs, along with the execution of language-driven commands integrated with visual information. This review paper thoroughly investigates how foundation models such as LLMs and VLMs have been employed to boost robot intelligence. For clarity, the research areas are categorized into five topics: reward design in reinforcement learning, low-level control, high-level planning, manipulation, and scene understanding. This review also summarizes studies that show how foundation models, such as the Eureka model for automating reward function design in reinforcement learning, RT-2 for integrating visual data, language, and robot actions in vision-language-action models, and AutoRT for generating feasible tasks and executing robot behavior policies via LLMs, have improved robot intelligence. Full article
Show Figures

Figure 1

19 pages, 5897 KiB  
Article
Tracking and Behavior Analysis of Group-Housed Pigs Based on a Multi-Object Tracking Approach
by Shuqin Tu, Jiaying Du, Yun Liang, Yuefei Cao, Weidian Chen, Deqin Xiao and Qiong Huang
Animals 2024, 14(19), 2828; https://doi.org/10.3390/ani14192828 - 30 Sep 2024
Viewed by 378
Abstract
Smart farming technologies to track and analyze pig behaviors in natural environments are critical for monitoring the health status and welfare of pigs. This study aimed to develop a robust multi-object tracking (MOT) approach named YOLOv8 + OC-SORT(V8-Sort) for the automatic monitoring of [...] Read more.
Smart farming technologies to track and analyze pig behaviors in natural environments are critical for monitoring the health status and welfare of pigs. This study aimed to develop a robust multi-object tracking (MOT) approach named YOLOv8 + OC-SORT(V8-Sort) for the automatic monitoring of the different behaviors of group-housed pigs. We addressed common challenges such as variable lighting, occlusion, and clustering between pigs, which often lead to significant errors in long-term behavioral monitoring. Our approach offers a reliable solution for real-time behavior tracking, contributing to improved health and welfare management in smart farming systems. First, the YOLOv8 is employed for the real-time detection and behavior classification of pigs under variable light and occlusion scenes. Second, the OC-SORT is utilized to track each pig to reduce the impact of pigs clustering together and occlusion on tracking. And, when a target is lost during tracking, the OC-SORT can recover the lost trajectory and re-track the target. Finally, to implement the automatic long-time monitoring of behaviors for each pig, we created an automatic behavior analysis algorithm that integrates the behavioral information from detection and the tracking results from OC-SORT. On the one-minute video datasets for pig tracking, the proposed MOT method outperforms JDE, Trackformer, and TransTrack, achieving the highest HOTA, MOTA, and IDF1 scores of 82.0%, 96.3%, and 96.8%, respectively. And, it achieved scores of 69.0% for HOTA, 99.7% for MOTA, and 75.1% for IDF1 on sixty-minute video datasets. In terms of pig behavior analysis, the proposed automatic behavior analysis algorithm can record the duration of four types of behaviors for each pig in each pen based on behavior classification and ID information to represent the pigs’ health status and welfare. These results demonstrate that the proposed method exhibits excellent performance in behavior recognition and tracking, providing technical support for prompt anomaly detection and health status monitoring for pig farming managers. Full article
(This article belongs to the Section Pigs)
Show Figures

Figure 1

22 pages, 12450 KiB  
Article
Research on the Behavior Recognition of Beef Cattle Based on the Improved Lightweight CBR-YOLO Model Based on YOLOv8 in Multi-Scene Weather
by Ye Mu, Jinghuan Hu, Heyang Wang, Shijun Li, Hang Zhu, Lan Luo, Jinfan Wei, Lingyun Ni, Hongli Chao, Tianli Hu, Yu Sun, He Gong and Ying Guo
Animals 2024, 14(19), 2800; https://doi.org/10.3390/ani14192800 - 27 Sep 2024
Viewed by 392
Abstract
In modern animal husbandry, intelligent digital farming has become the key to improve production efficiency. This paper introduces a model based on improved YOLOv8, Cattle Behavior Recognition-YOLO (CBR-YOLO), which aims to accurately identify the behavior of cattle. We not only generate a variety [...] Read more.
In modern animal husbandry, intelligent digital farming has become the key to improve production efficiency. This paper introduces a model based on improved YOLOv8, Cattle Behavior Recognition-YOLO (CBR-YOLO), which aims to accurately identify the behavior of cattle. We not only generate a variety of weather conditions, but also introduce multi-target detection technology to achieve comprehensive monitoring of cattle and their status. We introduce Inner-MPDIoU Loss and we have innovatively designed the Multi-Convolutional Focused Pyramid module to explore and learn in depth the detailed features of cattle in different states. Meanwhile, the Lightweight Multi-Scale Feature Fusion Detection Head module is proposed to take advantage of deep convolution, achieving a lightweight network architecture and effectively reducing redundant information. Experimental results prove that our method achieves an average accuracy of 90.2% with a reduction of 3.9 G floating-point numbers, an increase of 7.4%, significantly better than 12 kinds of SOTA object detection models. By deploying our approach on monitoring computers on farms, we expect to advance the development of automated cattle monitoring systems to improve animal welfare and farm management. Full article
(This article belongs to the Section Cattle)
Show Figures

Figure 1

13 pages, 3541 KiB  
Technical Note
Damage Scene Change Detection Based on Infrared Polarization Imaging and Fast-PCANet
by Min Yang, Jie Yang, Hongxia Mao and Chong Zheng
Remote Sens. 2024, 16(19), 3559; https://doi.org/10.3390/rs16193559 - 25 Sep 2024
Viewed by 331
Abstract
Change detection based on optical image processing plays a crucial role in the field of damage assessment. Although existing damage scene change detection methods have achieved some good results, they are faced with challenges, such as low accuracy and slow speed in optical [...] Read more.
Change detection based on optical image processing plays a crucial role in the field of damage assessment. Although existing damage scene change detection methods have achieved some good results, they are faced with challenges, such as low accuracy and slow speed in optical image change detection. To solve these problems, an image change detection approach that combines infrared polarization imaging with a fast principal component analysis network (Fast-PCANet) is proposed. Firstly, the acquired infrared polarization images are analyzed, and pixel image blocks are extracted and filtered to obtain the candidate change points. Then, the Fast-PCANet network framework is established, and the candidate pixel image blocks are sent to the network to detect the change pixel points. Finally, the false-detection points predicted by the Fast-PCANet are further corrected by region filling and filtering to obtain the final binary change map of the damage scene. Comparisons with typical PCANet-based change detection algorithms are made on a dataset of infrared-polarized images. The experimental results show that the proposed Fast-PCANet method improves the PCC and the Kappa coefficient of infrared polarization images over infrared intensity images by 6.77% and 13.67%, respectively. Meanwhile, the inference speed can be more than seven times faster. The results verify that the proposed approach is effective and efficient for the change detection task with infrared polarization imaging. The study can be applied to the damage assessment field and has great potential for object recognition, material classification, and polarization remote sensing. Full article
Show Figures

Figure 1

13 pages, 10686 KiB  
Article
HubNet: An E2E Model for Wheel Hub Text Detection and Recognition Using Global and Local Features
by Yue Zeng and Cai Meng
Sensors 2024, 24(19), 6183; https://doi.org/10.3390/s24196183 - 24 Sep 2024
Viewed by 282
Abstract
Automatic detection and recognition of wheel hub text, which can boost the efficiency and accuracy of product information recording, are undermined by the obscurity and orientation variability of text on wheel hubs. To address these issues, this paper constructs a wheel hub text [...] Read more.
Automatic detection and recognition of wheel hub text, which can boost the efficiency and accuracy of product information recording, are undermined by the obscurity and orientation variability of text on wheel hubs. To address these issues, this paper constructs a wheel hub text dataset and proposes a wheel hub text detection and recognition model called HubNet. The dataset captured images on real industrial production line scenes, including 446 images, 934 word instances, and 2947 character instances. HubNet is an end-to-end text detection and recognition model, not only comprising conventional detection and recognition heads but also incorporating a feature cross-fusion module, which improves the accuracy of recognizing wheel hub texts by utilizing both global and local features. Experimental results show that on the wheel hub text dataset, the HubNet achieves an accuracy of 86.5%, a recall of 79.4%, and an F1-score of 0.828, and the feature cross-fusion module increases the accuracy by 2% to 4%. The wheel hub dataset and the HubNet offer a significant reference for automatic detection and recognition of wheel hub text. Full article
Show Figures

Figure 1

19 pages, 17909 KiB  
Article
Nighttime Pothole Detection: A Benchmark
by Min Ling, Quanjun Shi, Xin Zhao, Wenzheng Chen, Wei Wei, Kai Xiao, Zeyu Yang, Hao Zhang, Shuiwang Li, Chenchen Lu and Yufan Zeng
Electronics 2024, 13(19), 3790; https://doi.org/10.3390/electronics13193790 - 24 Sep 2024
Viewed by 364
Abstract
In the field of computer vision, the detection of road potholes at night represents a critical challenge in enhancing the safety of intelligent transportation systems. Ensuring road safety is of paramount importance, particularly in promptly repairing pothole issues. These abrupt road depressions can [...] Read more.
In the field of computer vision, the detection of road potholes at night represents a critical challenge in enhancing the safety of intelligent transportation systems. Ensuring road safety is of paramount importance, particularly in promptly repairing pothole issues. These abrupt road depressions can easily lead to vehicle skidding, loss of control, and even traffic accidents, especially when water has pooled in or submerged the potholes. Therefore, the detection and recognition of road potholes can significantly reduce vehicle damage and the incidence of safety incidents. However, research on road pothole detection lacks high-quality annotated datasets, particularly under low-light conditions at night. To address this issue, this study introduces a novel Nighttime Pothole Dataset (NPD), independently collected and comprising 3831 images that capture diverse scene variations. The construction of this dataset aims to counteract the insufficiency of existing data resources and strives to provide a richer and more realistic benchmark. Additionally, we develop a baseline detector, termed WT-YOLOv8, for the proposed dataset, based on YOLOv8. We also evaluate the performance of the improved WT-YOLOv8 method and eight state-of-the-art object detection methods on the NPD and the COCO dataset. The experimental results on the NPD demonstrate that WT-YOLOv8 achieves a 2.3% improvement in mean Average Precision (mAP) over YOLOv8. In terms of the key metrics—[email protected] and [email protected]—it shows enhancements of 1.5% and 2.8%, respectively, compared to YOLOv8. The experimental results provide valuable insights into each method’s strengths and weaknesses under low-light conditions. This analysis highlights the importance of a specialized dataset for nighttime pothole detection and shows variations in accuracy and robustness among methods, emphasizing the need for improved nighttime pothole detection techniques. The introduction of the NPD is expected to stimulate further research, encouraging the development of advanced algorithms for nighttime pothole detection, ultimately leading to more flexible and reliable road maintenance and road safety. Full article
(This article belongs to the Special Issue New Trends in AI-Assisted Computer Vision)
Show Figures

Figure 1

19 pages, 5285 KiB  
Article
YOLO-TSF: A Small Traffic Sign Detection Algorithm for Foggy Road Scenes
by Rongzhen Li, Yajun Chen, Yu Wang and Chaoyue Sun
Electronics 2024, 13(18), 3744; https://doi.org/10.3390/electronics13183744 - 20 Sep 2024
Viewed by 584
Abstract
The accurate and rapid detection of traffic signs is crucial for intelligent transportation systems. Aiming at the problems that traffic signs have including more small targets in road scenes as well as misdetection, omission, and low recognition accuracy under the influence of fog, [...] Read more.
The accurate and rapid detection of traffic signs is crucial for intelligent transportation systems. Aiming at the problems that traffic signs have including more small targets in road scenes as well as misdetection, omission, and low recognition accuracy under the influence of fog, we propose a model for detecting traffic signs in foggy road scenes—YOLO-TSF. Firstly, we design the CCAM attention module and combine it with the idea of local–global residual learning thus proposing the LGFFM to enhance the model recognition capabilities in foggy weather. Secondly, we design MASFFHead by introducing the idea of ASFF to solve the feature loss problem of cross-scale fusion and perform a secondary extraction of small targets. Additionally, we design the NWD-CIoU by combining NWD and CIoU to solve the issue of inadequate learning capacity of IoU for diminutive target features. Finally, to address the dearth of foggy traffic signs datasets, we construct a new foggy traffic signs dataset, Foggy-TT100k. The experimental results show that the mAP@0.5, mAP@0.5:0.95, Precision, and F1-score of YOLO-TSF are improved by 8.8%, 7.8%, 7.1%, and 8.0%, respectively, compared with YOLOv8s, which proves its effectiveness in detecting small traffic signs in foggy scenes with visibility between 50 and 200 m. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

18 pages, 18674 KiB  
Article
An Improved Instance Segmentation Method for Complex Elements of Farm UAV Aerial Survey Images
by Feixiang Lv, Taihong Zhang, Yunjie Zhao, Zhixin Yao and Xinyu Cao
Sensors 2024, 24(18), 5990; https://doi.org/10.3390/s24185990 - 15 Sep 2024
Viewed by 538
Abstract
Farm aerial survey layers can assist in unmanned farm operations, such as planning paths and early warnings. To address the inefficiencies and high costs associated with traditional layer construction, this study proposes a high-precision instance segmentation algorithm based on SparseInst. Considering the structural [...] Read more.
Farm aerial survey layers can assist in unmanned farm operations, such as planning paths and early warnings. To address the inefficiencies and high costs associated with traditional layer construction, this study proposes a high-precision instance segmentation algorithm based on SparseInst. Considering the structural characteristics of farm elements, this study introduces a multi-scale attention module (MSA) that leverages the properties of atrous convolution to expand the sensory field. It enhances spatial and channel feature weights, effectively improving segmentation accuracy for large-scale and complex targets in the farm through three parallel dense connections. A bottom-up aggregation path is added to the feature pyramid fusion network, enhancing the model’s ability to perceive complex targets such as mechanized trails in farms. Coordinate attention blocks (CAs) are incorporated into the neck to capture richer contextual semantic information, enhancing farm aerial imagery scene recognition accuracy. To assess the proposed method, we compare it against existing mainstream object segmentation models, including the Mask R-CNN, Cascade–Mask, SOLOv2, and Condinst algorithms. The experimental results show that the improved model proposed in this study can be adapted to segment various complex targets in farms. The accuracy of the improved SparseInst model greatly exceeds that of Mask R-CNN and Cascade–Mask and is 10.8 and 12.8 percentage points better than the average accuracy of SOLOv2 and Condinst, respectively, with the smallest number of model parameters. The results show that the model can be used for real-time segmentation of targets under complex farm conditions. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

15 pages, 2064 KiB  
Article
Research on the Depth Image Reconstruction Algorithm Using the Two-Dimensional Kaniadakis Entropy Threshold
by Xianhui Yang, Jianfeng Sun, Le Ma, Xin Zhou, Wei Lu and Sining Li
Sensors 2024, 24(18), 5950; https://doi.org/10.3390/s24185950 - 13 Sep 2024
Viewed by 411
Abstract
The photon-counting light laser detection and ranging (LiDAR), especially the Geiger mode avalanche photon diode (Gm-APD) LiDAR, can obtain three-dimensional images of the scene, with the characteristics of single-photon sensitivity, but the background noise limits the imaging quality of the laser radar. In [...] Read more.
The photon-counting light laser detection and ranging (LiDAR), especially the Geiger mode avalanche photon diode (Gm-APD) LiDAR, can obtain three-dimensional images of the scene, with the characteristics of single-photon sensitivity, but the background noise limits the imaging quality of the laser radar. In order to solve this problem, a depth image estimation method based on a two-dimensional (2D) Kaniadakis entropy thresholding method is proposed which transforms a weak signal extraction problem into a denoising problem for point cloud data. The characteristics of signal peak aggregation in the data and the spatio-temporal correlation features between target image elements in the point cloud-intensity data are exploited. Through adequate simulations and outdoor target-imaging experiments under different signal-to-background ratios (SBRs), the effectiveness of the method under low signal-to-background ratio conditions is demonstrated. When the SBR is 0.025, the proposed method reaches a target recovery rate of 91.7%, which is better than the existing typical methods, such as the Peak-picking method, Cross-Correlation method, and the sparse Poisson intensity reconstruction algorithm (SPIRAL), which achieve a target recovery rate of 15.7%, 7.0%, and 18.4%, respectively. Additionally, comparing with the SPIRAL, the reconstruction recovery ratio is improved by 73.3%. The proposed method greatly improves the integrity of the target under high-background-noise environments and finally provides a basis for feature extraction and target recognition. Full article
(This article belongs to the Special Issue Application of LiDAR Remote Sensing and Mapping)
Show Figures

Figure 1

16 pages, 1822 KiB  
Article
A Pedestrian Detection Network Based on an Attention Mechanism and Pose Information
by Zhaoyin Jiang, Shucheng Huang and Mingxing Li
Appl. Sci. 2024, 14(18), 8214; https://doi.org/10.3390/app14188214 - 12 Sep 2024
Viewed by 357
Abstract
Pedestrian detection has recently attracted widespread attention as a challenging problem in computer vision. The accuracy of pedestrian detection is affected by differences in gestures, background clutter, local occlusion, differences in scales, pixel blur, and other factors occurring in real scenes. These problems [...] Read more.
Pedestrian detection has recently attracted widespread attention as a challenging problem in computer vision. The accuracy of pedestrian detection is affected by differences in gestures, background clutter, local occlusion, differences in scales, pixel blur, and other factors occurring in real scenes. These problems lead to false and missed detections. In view of these visual description deficiencies, we leveraged pedestrian pose information as a supplementary resource to address the occlusion challenges that arise in pedestrian detection. An attention mechanism was integrated into the visual information as a supplement to the pose information, because the acquisition of pose information was limited by the pose estimation algorithm. We developed a pedestrian detection method that integrated an attention mechanism with visual and pose information, including pedestrian region generation and pedestrian recognition networks, effectively addressing occlusion and false detection issues. The pedestrian region proposal network was used to generate a series of candidate regions with possible pedestrian targets from the original image. Then, the pedestrian recognition network was used to judge whether each candidate region contained pedestrian targets. The pedestrian recognition network was composed of four parts: visual features, pedestrian poses, pedestrian attention, and classification modules. The visual feature module was responsible for extracting the visual feature descriptions of candidate regions. The pedestrian pose module was used to extract pose feature descriptions. The pedestrian attention module was used to extract attention information, and the classification module was responsible for fusing visual features and pedestrian pose descriptions with the attention mechanism. The experimental results on the Caltech and CityPersons datasets demonstrated that the proposed method could substantially more accurately identify pedestrians than current state-of-the-art methods. Full article
Show Figures

Figure 1

20 pages, 7583 KiB  
Article
Object/Scene Recognition Based on a Directional Pixel Voting Descriptor
by Abiel Aguilar-González, Alejandro Medina Santiago and J. A. de Jesús Osuna-Coutiño
Appl. Sci. 2024, 14(18), 8187; https://doi.org/10.3390/app14188187 - 11 Sep 2024
Viewed by 345
Abstract
Detecting objects in images is crucial for several applications, including surveillance, autonomous navigation, augmented reality, and so on. Although AI-based approaches such as Convolutional Neural Networks (CNNs) have proven highly effective in object detection, in scenarios where the objects being recognized are unknow, [...] Read more.
Detecting objects in images is crucial for several applications, including surveillance, autonomous navigation, augmented reality, and so on. Although AI-based approaches such as Convolutional Neural Networks (CNNs) have proven highly effective in object detection, in scenarios where the objects being recognized are unknow, it is difficult to generalize an AI model for such tasks. In another trend, feature-based approaches like SIFT, SURF, and ORB offer the capability to search any object but have limitations under complex visual variations. In this work, we introduce a novel edge-based object/scene recognition method. We propose that utilizing feature edges, instead of feature points, offers high performance under complex visual variations. Our primary contribution is a directional pixel voting descriptor based on image segments. Experimental results are promising; compared to previous approaches, ours demonstrates superior performance under complex visual variations and high processing speed. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

17 pages, 6083 KiB  
Article
GFI-YOLOv8: Sika Deer Posture Recognition Target Detection Method Based on YOLOv8
by He Gong, Jingyi Liu, Zhipeng Li, Hang Zhu, Lan Luo, Haoxu Li, Tianli Hu, Ying Guo and Ye Mu
Animals 2024, 14(18), 2640; https://doi.org/10.3390/ani14182640 - 11 Sep 2024
Viewed by 762
Abstract
As the sika deer breeding industry flourishes on a large scale, accurately assessing the health of these animals is of paramount importance. Implementing posture recognition through target detection serves as a vital method for monitoring the well-being of sika deer. This approach allows [...] Read more.
As the sika deer breeding industry flourishes on a large scale, accurately assessing the health of these animals is of paramount importance. Implementing posture recognition through target detection serves as a vital method for monitoring the well-being of sika deer. This approach allows for a more nuanced understanding of their physical condition, ensuring the industry can maintain high standards of animal welfare and productivity. In order to achieve remote monitoring of sika deer without interfering with the natural behavior of the animals, and to enhance animal welfare, this paper proposes a sika deer individual posture recognition detection algorithm GFI-YOLOv8 based on YOLOv8. Firstly, this paper proposes to add the iAFF iterative attention feature fusion module to the C2f of the backbone network module, replace the original SPPF module with AIFI module, and use the attention mechanism to adjust the feature channel adaptively. This aims to enhance granularity, improve the model’s recognition, and enhance understanding of sika deer behavior in complex scenes. Secondly, a novel convolutional neural network module is introduced to improve the efficiency and accuracy of feature extraction, while preserving the model’s depth and diversity. In addition, a new attention mechanism module is proposed to expand the receptive field and simplify the model. Furthermore, a new pyramid network and an optimized detection head module are presented to improve the recognition and interpretation of sika deer postures in intricate environments. The experimental results demonstrate that the model achieves 91.6% accuracy in recognizing the posture of sika deer, with a 6% improvement in accuracy and a 4.6% increase in mAP50 compared to YOLOv8n. Compared to other models in the YOLO series, such as YOLOv5n, YOLOv7-tiny, YOLOv8n, YOLOv8s, YOLOv9, and YOLOv10, this model exhibits higher accuracy, and improved mAP50 and mAP50-95 values. The overall performance is commendable, meeting the requirements for accurate and rapid identification of the posture of sika deer. This model proves beneficial for the precise and real-time monitoring of sika deer posture in complex breeding environments and under all-weather conditions. Full article
(This article belongs to the Section Animal System and Management)
Show Figures

Figure 1

40 pages, 4095 KiB  
Article
An End-to-End Scene Text Recognition for Bilingual Text
by Bayan M. Albalawi, Amani T. Jamal, Lama A. Al Khuzayem and Olaa A. Alsaedi
Big Data Cogn. Comput. 2024, 8(9), 117; https://doi.org/10.3390/bdcc8090117 - 9 Sep 2024
Viewed by 519
Abstract
Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily [...] Read more.
Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily focused on recognizing English text, whereas Arabic text has been underrepresented, and (2) most prior research has adopted separate approaches for scene text localization and recognition, as opposed to one integrated framework. To address these gaps, we propose a novel bilingual end-to-end approach that localizes and recognizes both Arabic and English text within a single natural scene image. Specifically, our approach utilizes pre-trained CNN models (ResNet and EfficientNetV2) with kernel representation for localization text and RNN models (LSTM and BiLSTM) with an attention mechanism for text recognition. In addition, the AraElectra Arabic language model was incorporated to enhance Arabic text recognition. Experimental results on the EvArest, ICDAR2017, and ICDAR2019 datasets demonstrated that our model not only achieves superior performance in recognizing horizontally oriented text but also in recognizing multi-oriented and curved Arabic and English text in natural scene images. Full article
Show Figures

Figure 1

16 pages, 3440 KiB  
Article
RFF-PoseNet: A 6D Object Pose Estimation Network Based on Robust Feature Fusion in Complex Scenes
by Xiaomei Lei, Wenhuan Lu, Jiu Yong and Jianguo Wei
Electronics 2024, 13(17), 3518; https://doi.org/10.3390/electronics13173518 - 4 Sep 2024
Viewed by 547
Abstract
Six degrees-of-freedom (6D) object pose estimation plays an important role in pattern recognition of fields such as robotics and augmented reality. However, there are issues with low accuracy and real-time performance of 6D object pose estimation in complex scenes. To address these challenges, [...] Read more.
Six degrees-of-freedom (6D) object pose estimation plays an important role in pattern recognition of fields such as robotics and augmented reality. However, there are issues with low accuracy and real-time performance of 6D object pose estimation in complex scenes. To address these challenges, in this article, RFF-PoseNet (a 6D object pose estimation network based on robust feature fusion) is proposed for complex scenes. Firstly, a more lightweight Ghost module is used to replace the convolutional blocks in the feature extraction network. Then, a pyramid pooling module is added to the semantic label branch of PoseCNN to fuse the features of different pooling layers and enhance the network’s ability to capture information about objects in complex scenes and the correlations between contextual information. Finally, a pose regression and optimization module is utilized to further improve object pose estimation in complex scenes. Simulation experiments conducted on the YCB-Video and Occlusion LineMOD datasets show that the RFF-PoseNet algorithm can strengthen the correlation of features between different levels and the recognition ability of unclear targets, thereby achieving excellent accuracy and real-time performance, as well as strong robustness. Full article
Show Figures

Figure 1

Back to TopTop