A RGB-D feature fusion network for occluded object 6D pose estimation

Y Song, C Tang - Signal, Image and Video Processing, 2024 - Springer
Y Song, C Tang
Signal, Image and Video Processing, 2024Springer
Abstract 6D pose estimation using RGB-D data has been widely utilized in various
scenarios, with keypoint-based methods receiving significant attention due to their
exceptional performance. However, these methods still face numerous challenges,
especially when the object is heavily occluded or truncated. To address this issue, we
propose a novel cross-modal fusion network. Specifically, our approach initially employs
object detection to identify the potential position of the object and randomly samples within …
Abstract
6D pose estimation using RGB-D data has been widely utilized in various scenarios, with keypoint-based methods receiving significant attention due to their exceptional performance. However, these methods still face numerous challenges, especially when the object is heavily occluded or truncated. To address this issue, we propose a novel cross-modal fusion network. Specifically, our approach initially employs object detection to identify the potential position of the object and randomly samples within this region. Subsequently, a specially designed feature extraction network is utilized to extract appearance features from the RGB image and geometry features from the depth image respectively; these features are then implicitly aggregated through cross-modal fusion. Finally, keypoints are employed for estimating the pose of the object. The proposed method undergoes extensive testing on Occlusion Linemod and Truncation Linemod datasets. Experimental results demonstrate that our method has made significant advancements, thereby validating the effectiveness of cross-modal feature fusion strategy in enhancing the accuracy of RGB-D image pose estimation based on keypoints.
Springer
Showing the best result for this search. See all results