NFDI4DS | UHH-SEMS - Publication Details

XFormer: Fast and Accurate Monocular 3D Body Capture

Monocular RGB color model

DOI: 10.48550/arxiv.2305.11101 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (9)

Lihui Qian

Xintong Han

Faqiang Wang

Hongyu Liu

Haoye Dong

Zhiwen Li

Huawei Wei

Zhe Lin

Cheng‐Bin Jin

ABSTRACT

We present XFormer, a novel human mesh and motion capture method that achieves real-time performance on consumer CPUs given only monocular images as input. The proposed network architecture contains two branches: keypoint branch estimates 3D vertices 2D keypoints, an image makes predictions directly from the RGB features. At core of our is cross-modal transformer block allows information to flow across these branches by modeling attention between coordinates spatial Our smartly designed, which enables us train various types datasets including with 2D/3D annotations, pseudo labels, do not have associated images. This effectively improves accuracy generalization ability system. Built lightweight backbone (MobileNetV3), runs blazing fast (over 30fps single CPU core) still yields competitive accuracy. Furthermore, HRNet backbone, XFormer delivers state-of-the-art Huamn3.6 3DPW datasets.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

XFormer: Fast and Accurate Monocular 3D Body Capture

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....