Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (202)

Search Parameters:
Keywords = 3D skeleton data

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 2277 KiB  
Article
Vessel Geometry Estimation for Patients with Peripheral Artery Disease
by Hassan Saeed and Andrzej Skalski
Sensors 2024, 24(19), 6441; https://doi.org/10.3390/s24196441 - 4 Oct 2024
Viewed by 783
Abstract
The estimation of vessels’ centerlines is a critical step in assessing the geometry of the vessel, the topological representation of the vessel tree, and vascular network visualization. In this research, we present a novel method for obtaining geometric parameters from peripheral arteries in [...] Read more.
The estimation of vessels’ centerlines is a critical step in assessing the geometry of the vessel, the topological representation of the vessel tree, and vascular network visualization. In this research, we present a novel method for obtaining geometric parameters from peripheral arteries in 3D medical binary volumes. Our approach focuses on centerline extraction, which yields smooth and robust results. The procedure starts with a segmented 3D binary volume, from which a distance map is generated using the Euclidean distance transform. Subsequently, a skeleton is extracted, and seed points and endpoints are identified. A search methodology is used to derive the best path on the skeletonized 3D binary array while tracking from the goal points to the seed point. We use the distance transform to calculate the distance between voxels and the nearest vessel surface, while also addressing bifurcations when vessels divide into multiple branches. The proposed method was evaluated on 22 real cases and 10 synthetically generated vessels. We compared our method to different state-of-the-art approaches and demonstrated its better performance. The proposed method achieved an average error of 1.382 mm with real patient data and 0.571 mm with synthetic data, both of which are lower than the errors obtained by other state-of-the-art methodologies. This extraction of the centerline facilitates the estimation of multiple geometric parameters of vessels, including radius, curvature, and length. Full article
(This article belongs to the Collection Biomedical Imaging and Sensing)
Show Figures

Figure 1

14 pages, 905 KiB  
Article
Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition
by Hosam Abduljalil, Ahmed Elhayek, Abdullah Marish Ali and Fawaz Alsolami
AI 2024, 5(3), 1695-1708; https://doi.org/10.3390/ai5030083 - 23 Sep 2024
Viewed by 748
Abstract
Human action recognition (HAR) based on skeleton data is a challenging yet crucial task due to its wide-ranging applications, including patient monitoring, security surveillance, and human- machine interaction. Although numerous algorithms have been proposed to distinguish between various activities, most practical applications require [...] Read more.
Human action recognition (HAR) based on skeleton data is a challenging yet crucial task due to its wide-ranging applications, including patient monitoring, security surveillance, and human- machine interaction. Although numerous algorithms have been proposed to distinguish between various activities, most practical applications require highly accurate detection of specific actions. In this study, we propose a novel, highly accurate spatiotemporal graph autoencoder network for HAR, designated as GA-GCN. Furthermore, an extensive investigation was conducted employing diverse modalities. To this end, a spatiotemporal graph autoencoder was constructed to automatically learn both spatial and temporal patterns from skeleton data. The proposed method achieved accuracies of 92.3% and 96.8% on the NTU RGB+D dataset for cross-subject and cross-view evaluations, respectively. On the more challenging NTU RGB+D 120 dataset, GA-GCN attained accuracies of 88.8% and 90.4% for cross-subject and cross-set evaluations. Overall, our model outperforms the majority of the existing state-of-the-art methods on these common benchmark datasets. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

18 pages, 16152 KiB  
Article
Characterization of Wing Kinematics by Decoupling Joint Movement in the Pigeon
by Yishi Shen, Shi Zhang, Weimin Huang, Chengrui Shang, Tao Sun and Qing Shi
Biomimetics 2024, 9(9), 555; https://doi.org/10.3390/biomimetics9090555 - 15 Sep 2024
Viewed by 789
Abstract
Birds have remarkable flight capabilities due to their adaptive wing morphology. However, studying live birds is time-consuming and laborious, and obtaining information about the complete wingbeat cycle is difficult. To address this issue and provide a complete dataset, we recorded comprehensive motion capture [...] Read more.
Birds have remarkable flight capabilities due to their adaptive wing morphology. However, studying live birds is time-consuming and laborious, and obtaining information about the complete wingbeat cycle is difficult. To address this issue and provide a complete dataset, we recorded comprehensive motion capture wing trajectory data from five free-flying pigeons (Columba livia). Five key motion parameters are used to quantitatively characterize wing kinematics: flapping, sweeping, twisting, folding and bending. In addition, the forelimb skeleton is mapped using an open-chain three-bar mechanism model. By systematically evaluating the relationship of joint degrees of freedom (DOFs), we configured the model as a 3-DOF shoulder, 1-DOF elbow and 2-DOF wrist. Based on the correlation analysis between wingbeat kinematics and joint movement, we found that the strongly correlated shoulder and wrist roll within the stroke plane cause wing flap and bending. There is also a strong correlation between shoulder, elbow and wrist yaw out of the stroke plane, which causes wing sweep and fold. By simplifying the wing morphing, we developed three flapping wing robots, each with different DOFs inside and outside the stroke plane. This study provides insight into the design of flapping wing robots capable of mimicking the 3D wing motion of pigeons. Full article
(This article belongs to the Special Issue Biologically Inspired Design and Control of Robots: Second Edition)
Show Figures

Figure 1

13 pages, 3846 KiB  
Article
3D-STARNET: Spatial–Temporal Attention Residual Network for Robust Action Recognition
by Jun Yang, Shulong Sun, Jiayue Chen, Haizhen Xie, Yan Wang and Zenglong Yang
Appl. Sci. 2024, 14(16), 7154; https://doi.org/10.3390/app14167154 - 15 Aug 2024
Viewed by 991
Abstract
Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial–Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model [...] Read more.
Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial–Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model significantly improves the performance of action recognition through the following three main innovations: (1) the conversion from skeleton points to heat maps. Using Gaussian transform to convert skeleton point data into heat maps effectively reduces the model’s strong dependence on the original skeleton point data and enhances the stability and robustness of the data; (2) a spatiotemporal attention mechanism (STA). A novel spatiotemporal attention mechanism is proposed, focusing on the extraction of key frames and key areas within frames, which significantly enhances the model’s ability to identify behavioral patterns; (3) a multi-stage residual structure (MS-Residual). The introduction of a multi-stage residual structure improves the efficiency of data transmission in the network, solves the gradient vanishing problem in deep networks, and helps to improve the recognition efficiency of the model. Experimental results on the NTU-RGBD120 dataset show that 3D-STARNET has significantly improved the accuracy of action recognition, and the top1 accuracy of the overall network reached 96.74%. This method not only solves the robustness shortcomings of existing methods, but also improves the ability to capture spatiotemporal features, providing an efficient and widely applicable solution for action recognition based on skeletal data. Full article
Show Figures

Figure 1

23 pages, 10680 KiB  
Article
Multi-Teacher D-S Fusion for Semi-Supervised SAR Ship Detection
by Xinzheng Zhang, Jinlin Li, Chao Li and Guojin Liu
Remote Sens. 2024, 16(15), 2759; https://doi.org/10.3390/rs16152759 - 28 Jul 2024
Viewed by 987
Abstract
Ship detection from synthetic aperture radar (SAR) imagery is crucial for various fields in real-world applications. Numerous deep learning-based detectors have been investigated for SAR ship detection, which requires a substantial amount of labeled data for training. However, SAR data annotation is time-consuming [...] Read more.
Ship detection from synthetic aperture radar (SAR) imagery is crucial for various fields in real-world applications. Numerous deep learning-based detectors have been investigated for SAR ship detection, which requires a substantial amount of labeled data for training. However, SAR data annotation is time-consuming and demands specialized expertise, resulting in deep learning-based SAR ship detectors struggling due to a lack of annotations. With limited labeled data, semi-supervised learning is a popular approach for boosting detection performance by excavating valuable information from unlabeled data. In this paper, a semi-supervised SAR ship detection network is proposed, termed a Multi-Teacher Dempster-Shafer Evidence Fusion Net-work (MTDSEFN). The MTDSEFN is an enhanced framework based on the basic teacher–student skeleton frame, comprising two branches: the Teacher Group (TG) and the Agency Teacher (AT). The TG utilizes multiple teachers to generate pseudo-labels for different augmentation versions of unlabeled samples, which are then refined to obtain high-quality pseudo-labels by using Dempster-Shafer (D-S) fusion. The AT not only serves to deliver weights of its own teacher to the TG at the end of each epoch but also updates its own weights after each iteration, enabling the model to effectively learn rich information from unlabeled data. The combination of TG and AT guarantees both reliable pseudo-label generation and a comprehensive diversity of learning information from numerous unlabeled samples. Extensive experiments were performed on two public SAR ship datasets, and the results demonstrated the effectiveness and superiority of the proposed approach. Full article
Show Figures

Figure 1

23 pages, 6371 KiB  
Article
Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network
by Junkai Yang, Yuqing He, Jingxuan Zhu, Zitao Lv and Weiqi Jin
Sensors 2024, 24(14), 4647; https://doi.org/10.3390/s24144647 - 17 Jul 2024
Viewed by 848
Abstract
The timely detection of falls and alerting medical aid is critical for health monitoring in elderly individuals living alone. This paper mainly focuses on issues such as poor adaptability, privacy infringement, and low recognition accuracy associated with traditional visual sensor-based fall detection. We [...] Read more.
The timely detection of falls and alerting medical aid is critical for health monitoring in elderly individuals living alone. This paper mainly focuses on issues such as poor adaptability, privacy infringement, and low recognition accuracy associated with traditional visual sensor-based fall detection. We propose an infrared video-based fall detection method utilizing spatial-temporal graph convolutional networks (ST-GCNs) to address these challenges. Our method used fine-tuned AlphaPose to extract 2D human skeleton sequences from infrared videos. Subsequently, the skeleton data was represented in Cartesian and polar coordinates and processed through a two-stream ST-GCN to recognize fall behaviors promptly. To enhance the network’s recognition capability for fall actions, we improved the adjacency matrix of graph convolutional units and introduced multi-scale temporal graph convolution units. To facilitate practical deployment, we optimized time window and network depth of the ST-GCN, striking a balance between model accuracy and speed. The experimental results on a proprietary infrared human action recognition dataset demonstrated that our proposed algorithm accurately identifies fall behaviors with the highest accuracy of 96%. Moreover, our algorithm performed robustly, identifying falls in both near-infrared and thermal-infrared videos. Full article
(This article belongs to the Special Issue Multi-Modal Data Sensing and Processing)
Show Figures

Figure 1

18 pages, 7778 KiB  
Article
Convolutional Block Attention Module–Multimodal Feature-Fusion Action Recognition: Enabling Miner Unsafe Action Recognition
by Yu Wang, Xiaoqing Chen, Jiaoqun Li and Zengxiang Lu
Sensors 2024, 24(14), 4557; https://doi.org/10.3390/s24144557 - 14 Jul 2024
Cited by 2 | Viewed by 965
Abstract
The unsafe action of miners is one of the main causes of mine accidents. Research on underground miner unsafe action recognition based on computer vision enables relatively accurate real-time recognition of unsafe action among underground miners. A dataset called unsafe actions of underground [...] Read more.
The unsafe action of miners is one of the main causes of mine accidents. Research on underground miner unsafe action recognition based on computer vision enables relatively accurate real-time recognition of unsafe action among underground miners. A dataset called unsafe actions of underground miners (UAUM) was constructed and included ten categories of such actions. Underground images were enhanced using spatial- and frequency-domain enhancement algorithms. A combination of the YOLOX object detection algorithm and the Lite-HRNet human key-point detection algorithm was utilized to obtain skeleton modal data. The CBAM-PoseC3D model, a skeleton modal action-recognition model incorporating the CBAM attention module, was proposed and combined with the RGB modal feature-extraction model CBAM-SlowOnly. Ultimately, this formed the Convolutional Block Attention Module–Multimodal Feature-Fusion Action Recognition (CBAM-MFFAR) model for recognizing unsafe actions of underground miners. The improved CBAM-MFFAR model achieved a recognition accuracy of 95.8% on the NTU60 RGB+D public dataset under the X-Sub benchmark. Compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, the recognition accuracy was improved by 2%, 2.7%, 7.3%, and 14.3%, respectively. On the UAUM dataset, the CBAM-MFFAR model achieved a recognition accuracy of 94.6%, with improvements of 2.6%, 4%, 12%, and 17.3% compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, respectively. In field validation at mining sites, the CBAM-MFFAR model accurately recognized similar and multiple unsafe actions among underground miners. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

25 pages, 7113 KiB  
Article
LidPose: Real-Time 3D Human Pose Estimation in Sparse Lidar Point Clouds with Non-Repetitive Circular Scanning Pattern
by Lóránt Kovács, Balázs M. Bódis and Csaba Benedek
Sensors 2024, 24(11), 3427; https://doi.org/10.3390/s24113427 - 26 May 2024
Cited by 2 | Viewed by 1529
Abstract
In this paper, we propose a novel, vision-transformer-based end-to-end pose estimation method, LidPose, for real-time human skeleton estimation in non-repetitive circular scanning (NRCS) lidar point clouds. Building on the ViTPose architecture, we introduce novel adaptations to address the unique properties of NRCS lidars, [...] Read more.
In this paper, we propose a novel, vision-transformer-based end-to-end pose estimation method, LidPose, for real-time human skeleton estimation in non-repetitive circular scanning (NRCS) lidar point clouds. Building on the ViTPose architecture, we introduce novel adaptations to address the unique properties of NRCS lidars, namely, the sparsity and unusual rosetta-like scanning pattern. The proposed method addresses a common issue of NRCS lidar-based perception, namely, the sparsity of the measurement, which needs balancing between the spatial and temporal resolution of the recorded data for efficient analysis of various phenomena. LidPose utilizes foreground and background segmentation techniques for the NRCS lidar sensor to select a region of interest (RoI), making LidPose a complete end-to-end approach to moving pedestrian detection and skeleton fitting from raw NRCS lidar measurement sequences captured by a static sensor for surveillance scenarios. To evaluate the method, we have created a novel, real-world, multi-modal dataset, containing camera images and lidar point clouds from a Livox Avia sensor, with annotated 2D and 3D human skeleton ground truth. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

21 pages, 7158 KiB  
Article
Exploring High-Order Skeleton Correlations with Physical and Non-Physical Connection for Action Recognition
by Cheng Wang, Nan Ma and Zhixuan Wu
Appl. Sci. 2024, 14(9), 3832; https://doi.org/10.3390/app14093832 - 30 Apr 2024
Viewed by 779
Abstract
Hypergraphs have received widespread attention in modeling complex data correlations due to their superior performance. In recent years, some researchers have used hypergraph structures to characterize complex non-pairwise joints in the human skeleton and model higher-order correlations of the human skeleton. However, traditional [...] Read more.
Hypergraphs have received widespread attention in modeling complex data correlations due to their superior performance. In recent years, some researchers have used hypergraph structures to characterize complex non-pairwise joints in the human skeleton and model higher-order correlations of the human skeleton. However, traditional methods of constructing hypergraphs based on physical connections ignore the dependencies among non-physically connected joints or bones, and it is difficult to model the correlation among joints or bones that are highly correlated in human action but are physically connected at long distances. To address these issues, we propose a skeleton-based action recognition method for hypergraph learning based on skeleton correlation, which explores the effects of physically and non-physically connected skeleton information on accurate action recognition. Specifically, in this paper, spatio-temporal correlation modeling is performed on the natural connections inherent in humans (physical connections) and the joints or bones that are more dependent but not directly connected (non-physical connection) during human actions. In order to better learn the hypergraph structure, we construct a spatio-temporal hypergraph neural network to extract the higher-order correlations of the human skeleton. In addition, we use an attentional mechanism to compute the attentional weights among different hypergraph features, and adaptively fuse the rich feature information in different hypergraphs. Extensive experiments are conducted on two datasets, NTU-RGB+D 60 and Kinetics-Skeleton, and the results show that compared with the state-of-the-art skeleton-based methods, our proposed method can achieve an optimal level of performance with significant advantages, providing a more accurate environmental perception and action analysis for the development of embodied intelligence. Full article
(This article belongs to the Special Issue Autonomous Vehicles and Robotics)
Show Figures

Figure 1

15 pages, 1207 KiB  
Article
From Movements to Metrics: Evaluating Explainable AI Methods in Skeleton-Based Human Activity Recognition
by Kimji N. Pellano, Inga Strümke and Espen A. F. Ihlen
Sensors 2024, 24(6), 1940; https://doi.org/10.3390/s24061940 - 18 Mar 2024
Viewed by 1267
Abstract
The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human–computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability [...] Read more.
The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human–computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability and reliability of XAI evaluation metrics in the skeleton-based HAR domain. We have tested established XAI metrics, namely faithfulness and stability on Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) to address this problem. This study introduces a perturbation method that produces variations within the error tolerance of motion sensor tracking, ensuring the resultant skeletal data points remain within the plausible output range of human movement as captured by the tracking device. We used the NTU RGB+D 60 dataset and the EfficientGCN architecture for HAR model training and testing. The evaluation involved systematically perturbing the 3D skeleton data by applying controlled displacements at different magnitudes to assess the impact on XAI metric performance across multiple action classes. Our findings reveal that faithfulness may not consistently serve as a reliable metric across all classes for the EfficientGCN model, indicating its limited applicability in certain contexts. In contrast, stability proves to be a more robust metric, showing dependability across different perturbation magnitudes. Additionally, CAM and Grad-CAM yielded almost identical explanations, leading to closely similar metric outcomes. This suggests a need for the exploration of additional metrics and the application of more diverse XAI methods to broaden the understanding and effectiveness of XAI in skeleton-based HAR. Full article
Show Figures

Figure 1

20 pages, 7751 KiB  
Article
SCGFormer: Semantic Chebyshev Graph Convolution Transformer for 3D Human Pose Estimation
by Jiayao Liang and Mengxiao Yin
Appl. Sci. 2024, 14(4), 1646; https://doi.org/10.3390/app14041646 - 18 Feb 2024
Viewed by 1363
Abstract
With the rapid advancement of deep learning, 3D human pose estimation has largely freed itself from reliance on manually annotated methods. The effective utilization of joint features has become significant. Utilizing 2D human joint information to predict 3D human skeletons is of paramount [...] Read more.
With the rapid advancement of deep learning, 3D human pose estimation has largely freed itself from reliance on manually annotated methods. The effective utilization of joint features has become significant. Utilizing 2D human joint information to predict 3D human skeletons is of paramount importance. Effectively leveraging 2D joint data can improve the accuracy of 3D human skeleton prediction. In this paper, we propose the SCGFormer model to reduce the error in predicting human skeletal poses in three-dimensional space. The network architecture of SCGFormer encompasses Transformer and two distinct types of graph convolution, organized into two interconnected modules: SGraAttention and AcChebGconv. SGraAttention extracts global feature information from each 2D human joint, thereby augmenting local feature learning by integrating prior knowledge of human joint relationships. Simultaneously, AcChebGconv broadens the receptive field for graph structure information and constructs implicit joint relationships to aggregate more valuable adjacent features. SCGraFormer is tested on widely recognized benchmark datasets such as Human3.6M and MPI-INF-3DHP and achieves excellent results. In particular, on Human3.6M, our method achieves the best results in 9 actions (out of a total of 15 actions), with an overall average error reduction of about 1.5 points compared to state-of-the-art methods, demonstrating the excellent performance of SCGFormer. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

24 pages, 4112 KiB  
Article
Enhancing Human Action Recognition with 3D Skeleton Data: A Comprehensive Study of Deep Learning and Data Augmentation
by Chu Xin, Seokhwan Kim, Yongjoo Cho and Kyoung Shin Park
Electronics 2024, 13(4), 747; https://doi.org/10.3390/electronics13040747 - 13 Feb 2024
Cited by 1 | Viewed by 2122
Abstract
Human Action Recognition (HAR) is an important field that identifies human behavior through sensor data. Three-dimensional human skeleton data extracted from the Kinect depth sensor have emerged as a powerful alternative to mitigate the effects of lighting and occlusion of traditional 2D RGB [...] Read more.
Human Action Recognition (HAR) is an important field that identifies human behavior through sensor data. Three-dimensional human skeleton data extracted from the Kinect depth sensor have emerged as a powerful alternative to mitigate the effects of lighting and occlusion of traditional 2D RGB or grayscale image-based HAR. Data augmentation is a key technique to enhance model generalization and robustness in deep learning while suppressing overfitting to training data. In this paper, we conduct a comprehensive study of various data augmentation techniques specific to skeletal data, which aim to improve the accuracy of deep learning models. These augmentation methods include spatial augmentation, which generates augmented samples from the original 3D skeleton sequence, and temporal augmentation, which is designed to capture subtle temporal changes in motion. The evaluation covers two publicly available datasets and a proprietary dataset and employs three neural network models. The results highlight the impact of temporal augmentation on model performance on the skeleton datasets, while exhibiting the nuanced impact of spatial augmentation. The findings underscore the importance of tailoring augmentation strategies to specific dataset characteristics and actions, providing novel perspectives for model selection in skeleton-based human action recognition tasks. Full article
Show Figures

Figure 1

18 pages, 4329 KiB  
Article
Advancing Human Motion Recognition with SkeletonCLIP++: Weighted Video Feature Integration and Enhanced Contrastive Sample Discrimination
by Lin Yuan, Zhen He, Qiang Wang and Leiyang Xu
Sensors 2024, 24(4), 1189; https://doi.org/10.3390/s24041189 - 11 Feb 2024
Viewed by 1038
Abstract
This paper introduces ‘SkeletonCLIP++’, an extension of our prior work in human action recognition, emphasizing the use of semantic information beyond traditional label-based methods. The innovation, ‘Weighted Frame Integration’ (WFI), shifts video feature computation from averaging to a weighted frame approach, enabling a [...] Read more.
This paper introduces ‘SkeletonCLIP++’, an extension of our prior work in human action recognition, emphasizing the use of semantic information beyond traditional label-based methods. The innovation, ‘Weighted Frame Integration’ (WFI), shifts video feature computation from averaging to a weighted frame approach, enabling a more nuanced representation of human movements in line with semantic relevance. Another key development, ‘Contrastive Sample Identification’ (CSI), introduces a novel discriminative task within the model. This task involves identifying the most similar negative sample among positive ones, enhancing the model’s ability to distinguish between closely related actions. Incorporating the ‘BERT Text Encoder Integration’ (BTEI) leverages the pre-trained BERT model as our text encoder to refine the model’s performance. Empirical evaluations on HMDB-51, UCF-101, and NTU RGB+D 60 datasets illustrate positive improvements, especially in smaller datasets. ‘SkeletonCLIP++’ thus offers a refined approach to human action recognition, ensuring semantic integrity and detailed differentiation in video data analysis. Full article
(This article belongs to the Special Issue Smart Sensing Technology for Human Activity Recognition)
Show Figures

Figure 1

15 pages, 2566 KiB  
Article
A Low-Cost Inertial Measurement Unit Motion Capture System for Operation Posture Collection and Recognition
by Mingyue Yin, Jianguang Li and Tiancong Wang
Sensors 2024, 24(2), 686; https://doi.org/10.3390/s24020686 - 21 Jan 2024
Cited by 4 | Viewed by 2009
Abstract
In factories, human posture recognition facilitates human–machine collaboration, human risk management, and workflow improvement. Compared to optical sensors, inertial sensors have the advantages of portability and resistance to obstruction, making them suitable for factories. However, existing product-level inertial sensing solutions are generally expensive. [...] Read more.
In factories, human posture recognition facilitates human–machine collaboration, human risk management, and workflow improvement. Compared to optical sensors, inertial sensors have the advantages of portability and resistance to obstruction, making them suitable for factories. However, existing product-level inertial sensing solutions are generally expensive. This paper proposes a low-cost human motion capture system based on BMI 160, a type of six-axis inertial measurement unit (IMU). Based on WIFI communication, the collected data are processed to obtain the displacement of human joints’ rotation angles around XYZ directions and the displacement in XYZ directions, then the human skeleton hierarchical relationship was combined to calculate the real-time human posture. Furthermore, the digital human model was been established on Unity3D to synchronously visualize and present human movements. We simulated assembly operations in a virtual reality environment for human posture data collection and posture recognition experiments. Six inertial sensors were placed on the chest, waist, knee joints, and ankle joints of both legs. There were 16,067 labeled samples obtained for posture recognition model training, and the accumulated displacement and the rotation angle of six joints in the three directions were used as input features. The bi-directional long short-term memory (BiLSTM) model was used to identify seven common operation postures: standing, slightly bending, deep bending, half-squatting, squatting, sitting, and supine, with an average accuracy of 98.24%. According to the experiment result, the proposed method could be used to develop a low-cost and effective solution to human posture recognition for factory operation. Full article
(This article belongs to the Special Issue Advanced Sensors for Real-Time Monitoring Applications ‖)
Show Figures

Figure 1

27 pages, 5246 KiB  
Article
On the Evaluation of Diverse Vision Systems towards Detecting Human Pose in Collaborative Robot Applications
by Aswin K. Ramasubramanian, Marios Kazasidis, Barry Fay and Nikolaos Papakostas
Sensors 2024, 24(2), 578; https://doi.org/10.3390/s24020578 - 17 Jan 2024
Cited by 3 | Viewed by 1620
Abstract
Tracking human operators working in the vicinity of collaborative robots can improve the design of safety architecture, ergonomics, and the execution of assembly tasks in a human–robot collaboration scenario. Three commercial spatial computation kits were used along with their Software Development Kits that [...] Read more.
Tracking human operators working in the vicinity of collaborative robots can improve the design of safety architecture, ergonomics, and the execution of assembly tasks in a human–robot collaboration scenario. Three commercial spatial computation kits were used along with their Software Development Kits that provide various real-time functionalities to track human poses. The paper explored the possibility of combining the capabilities of different hardware systems and software frameworks that may lead to better performance and accuracy in detecting the human pose in collaborative robotic applications. This study assessed their performance in two different human poses at six depth levels, comparing the raw data and noise-reducing filtered data. In addition, a laser measurement device was employed as a ground truth indicator, together with the average Root Mean Square Error as an error metric. The obtained results were analysed and compared in terms of positional accuracy and repeatability, indicating the dependence of the sensors’ performance on the tracking distance. A Kalman-based filter was applied to fuse the human skeleton data and then to reconstruct the operator’s poses considering their performance in different distance zones. The results indicated that at a distance less than 3 m, Microsoft Azure Kinect demonstrated better tracking performance, followed by Intel RealSense D455 and Stereolabs ZED2, while at ranges higher than 3 m, ZED2 had superior tracking performance. Full article
(This article belongs to the Special Issue Multi-sensor for Human Activity Recognition: 2nd Edition)
Show Figures

Figure 1

Back to TopTop