Object Detection Based
Object Detection Based
ABSTRACT
In modern intelligent manufacturing, human-robot collaboration is essential for combining the advantages of robots and
human to facilitate mass customized production. In order to improve robot's understanding of the operator's movements
and the working environment, an intelligent recognition method based on improved RT-DETR is proposed. The method
introduces CBAM modules before the multi-scale recognition layer to improve the recognition accuracy of the model.
Meanwhile, in order to cope with the problem of increased number of parameters and computation complexity caused by
the addition of the CBAM modules, the Conv module of the backbone network is replaced by the GhostConv module.
Experimental results show that these two enhanced methods (CBAM and GhostConv) effectively improve the detection
performance of the original RT-DETR model under the dataset examined.
Keywords: Human-robot collaboration, RT-DETR model, Detection performance, GhostConv method
1 INTRODUCTION
To meet the diverse demands of today's market more efficiently, modern companies strive for large-scale customized
production[1]. Employing human-robot collaboration is an effective approach to achieve this goal by combining robots
capability for repeated work with high precision and human worker’s advantage in executing less standard, flexible and
versatile operations[2], thus improving productivity. To promote the robot assist the operator more proactively, researchers
used deep learning for recognizing human gestures[3], postures[4], and voice. The recognition results are then fed back to
robots, enabling them to realign and plan their movements accordingly. However, current research often focuses solely on
worker, neglecting the crucial interaction between the worker and objects being processed. It is important to note that most
actions of the worker start with approaching and contacting the object. Therefor recognizing worker’s contact with the
object is important to aid robots in predicting worker’s next movement, facilitating robots’ collaboration with the worker
in an effective, efficient and safe way.
An intelligent monitoring method for human-robot collaboration is proposed in this paper. The remainder of this paper is
organized as follows. Section 2 analyses the research problem in this paper. Section 3 outlines the proposed methodology
using an improved RT-DETR algorithm to achieve real-time detection of worker hand and object. Section 4 verifies the
feasibility of the proposed method through experiments. A conclusion is given in section 5.
2 RESEARCH PROBLEMS
As discussed above, the rapid development of mass customization has led to the widespread adoption of human-robot
collaboration in production [6-8]. To enhance the intelligence and flexibility of robots, researchers are focusing on the
automatic and intelligent recognition of operators’ movements and intentions [5], in the aim of enhancing collaboration
between robots and human workers. In recent years, deep learning has made significant progress in computer vision,
providing new ways to study the actions and intentions of workers. However, using deep learning method to help robots
know the real-time status of workers faces several challenges: First, training of the algorithm requires a large amount of
data, yet dataset for specific scenario and environment is limited. Second, accurate detection is not an easy task, as objected
to be identified have different shapes, sizes, and visual characteristics.
Fourth International Conference on Mechanical, Electronics, and Electrical and Automation Control (METMS 2024),
edited by Zeashan Hameed Khan, Junxing Zhang, Pengfei Zeng, Proc. of SPIE Vol. 13163,
131638P · © 2024 SPIE · 0277-786X · doi: 10.1117/12.3030128
4 EXPERIMENT
4.1 Dataset
Datasets for detecting relevant targets in human-robot collaboration are limited, and the targets to be detected vary
depending on the task. To address this issue, we have selected additional images from the 11k hand dataset[12] including
images of the operator's hand to expand the sample set. Additionally, we have expanded the dataset using data enhancement
methods. The expanded dataset comprises 2500 labelled images, categorized as worker_hand, robot_hand and object using
the MakeSense platform[13]. The dataset is divided into training, testing, and validation sets in an 8:1:1 ratio.
4.2 Model training
The experiments in this paper is conducted using the Ubuntu 18.04 operating system, with an NVIDIA GeForce RTX 3080
graphics card. Pytorch 1.7.0 is used as the deep learning framework, with Cuda version 11.0 and Python 3.8 as the
programming environment. The left side of Fig.4 displays the curve of the training loss in 250 epochs. As the model
employs position coding, the loss value is high in the beginning. As the number of epoch increases, the training loss level
off after the 50th epoch and converge after 100 epochs. There is no underfitting or overfitting in train process.
4.3 Analysis of test results
To verify the effectiveness of the improved algorithm, three sets of ablation experiments are performed and the results are
shown in Table 1. We employ precision rate (P), recall rate (R), mean accuracy precision (mAP0.5) as the evaluation
criteria. As can be seen from Table 1, after using GhostConv, P is improved by 3.55% compared to RT-DETR; after
introducing the CBAM attention mechanism, although the recognition accuracy is reduced compared to using GhostConv,
the feature extraction ability of the model is improved, and the recall rate is increased by 1.4% compared to RT-DETR; by
combining GhostConv and CBAM attention mechanism, the detection accuracy, recall and mAP of the improved RT-
DETR are increased by 1.2%, 0.6% and 0.8%, respectively, compared to RT-DETR, indicating that the combination of
these two methods can contribute to the improvement of the model detection performance.
Table 1 Ablation experiment results.
5 CONCLUSION
In order to improve robots’ understanding of workers' actions in human-robot collaboration, this paper proposes an
intelligent recognition method that improves RT-DETR to recognize operator's hand and the object. First, the Conv module
of the backbone network is replaced by the GhostConv module to reduce the number of model parameters and computation
complexity. Meanwhile, CBAM attention mechanism is supplemented before the detection layer to highlight important
information on the feature maps of different scales to make the prediction results more accurate. Experimental results show
that compared with the original RT-DETR model, the detection accuracy of the improved model is increased by 1.2% and
the recall rate is increased by 0.6%, providing a more accurate base for subsequent in-depth research.
REFERENCES
[1] Zheng P, Wang Z, Chen C H, et al. A Survey of Smart Product-Service Systems: Key Aspects, Challenges and
Future Perspectives[J]. Advanced Engineering Informatics, 2019. DOI:10.1016/j.aei.2019.100973.
[2] Liu S, Wang L, Wang X V. Symbiotic human-robot collaboration: multimodal control using function blocks[J].
Procedia CIRP, 2020, 93: 1188-1193. DOI:10.1016/j.procir.2020.03.022.
[3] Wang W, Li R, Chen Y, et al. Predicting human intentions in human–robot hand-over tasks through multimodal
learning[J]. IEEE Transactions on Automation Science and Engineering, 2021, 19(3): 2339-2353.
DOI:10.1109/TASE.2021.3074873.
[4] Costanzo M , Maria G D , Lettera G ,et al.A Multimodal Approach to Human Safety in Collaborative Robotic
Workcells[J].IEEE Transactions on Automation Science and Engineering, 2021,PP(99):1-
15.DOI:10.1109/TASE.2020.3043286.
[5] Bo Wang, Jinhua Xiao, "Exploring human intention recognition based on human robot collaboration
manufacturing toward safety production," Proc. SPIE 12803, Fifth International Conference on Artificial
Intelligence and Computer Science (AICS 2023), 128033Q (16 October 2023);
https://doi.org/10.1117/12.3009583.
[6] H. Yin, J. Xiao and G. Wang, "Human-Robot Collaboration Re-Manufacturing for Uncertain Disassembly in
Retired Battery Recycling," 2022 5th World Conference on Mechanical Engineering and Intelligent
Manufacturing (WCMEIM), Ma'anshan, China, 2022, pp. 595-598.DOI:
10.1109/WCMEIM56910.2022.10021388.
[7] Xiao, J., Gao, J., Anwer, N., and Eynard, B. (July 21, 2023). "Multi-Agent Reinforcement Learning Method for
Disassembly Sequential Task Optimization Based on Human–Robot Collaborative Disassembly in Electric
Vehicle Battery Recycling." ASME. J. Manuf. Sci. Eng. December 2023; 145(12): 121001.
https://doi.org/10.1115/1.4062235
[8] Xiao J, Anwer N, Li W, et al. Dynamic Bayesian network-based disassembly sequencing optimization for electric
vehicle battery[J]. CIRP Journal of Manufacturing Science and Technology, 2022, 38: 824-835.