Geometry-Biased Transformer for Robust Multi-View 3D Human Pose Reconstruction

Moliner, Olivier; Huang, Sangxia; Åström, Kalle

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.17106 (cs)

[Submitted on 28 Dec 2023]

Title:Geometry-Biased Transformer for Robust Multi-View 3D Human Pose Reconstruction

Authors:Olivier Moliner, Sangxia Huang, Kalle Åström

View PDF HTML (experimental)

Abstract:We address the challenges in estimating 3D human poses from multiple views under occlusion and with limited overlapping views. We approach multi-view, single-person 3D human pose reconstruction as a regression problem and propose a novel encoder-decoder Transformer architecture to estimate 3D poses from multi-view 2D pose sequences. The encoder refines 2D skeleton joints detected across different views and times, fusing multi-view and temporal information through global self-attention. We enhance the encoder by incorporating a geometry-biased attention mechanism, effectively leveraging geometric relationships between views. Additionally, we use detection scores provided by the 2D pose detector to further guide the encoder's attention based on the reliability of the 2D detections. The decoder subsequently regresses the 3D pose sequence from these refined tokens, using pre-defined queries for each joint. To enhance the generalization of our method to unseen scenes and improve resilience to missing joints, we implement strategies including scene centering, synthetic views, and token dropout. We conduct extensive experiments on three benchmark public datasets, Human3.6M, CMU Panoptic and Occlusion-Persons. Our results demonstrate the efficacy of our approach, particularly in occluded scenes and when few views are available, which are traditionally challenging scenarios for triangulation-based methods.

Comments:	Accepted: 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.17106 [cs.CV]
	(or arXiv:2312.17106v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.17106

Submission history

From: Olivier Moliner [view email]
[v1] Thu, 28 Dec 2023 16:30:05 UTC (167 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Geometry-Biased Transformer for Robust Multi-View 3D Human Pose Reconstruction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Geometry-Biased Transformer for Robust Multi-View 3D Human Pose Reconstruction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators