XFormer: Fast and Accurate Monocular 3D Body Capture
Monocular
RGB color model
DOI:
10.48550/arxiv.2305.11101
Publication Date:
2023-01-01
AUTHORS (9)
ABSTRACT
We present XFormer, a novel human mesh and motion capture method that achieves real-time performance on consumer CPUs given only monocular images as input. The proposed network architecture contains two branches: keypoint branch estimates 3D vertices 2D keypoints, an image makes predictions directly from the RGB features. At core of our is cross-modal transformer block allows information to flow across these branches by modeling attention between coordinates spatial Our smartly designed, which enables us train various types datasets including with 2D/3D annotations, pseudo labels, do not have associated images. This effectively improves accuracy generalization ability system. Built lightweight backbone (MobileNetV3), runs blazing fast (over 30fps single CPU core) still yields competitive accuracy. Furthermore, HRNet backbone, XFormer delivers state-of-the-art Huamn3.6 3DPW datasets.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....