Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser
DOI:
10.48550/arxiv.2403.04444
Publication Date:
2024-03-07
AUTHORS (5)
ABSTRACT
Recently, diffusion-based methods for monocular 3D human pose estimation have achieved state-of-the-art (SOTA) performance by directly regressing the joint coordinates from 2D sequence. Although some decompose task into bone length and direction prediction based on anatomical skeleton to explicitly incorporate more body prior constraints, of these is significantly lower than that SOTA methods. This can be attributed tree structure skeleton. Direct application disentangled method could amplify accumulation hierarchical errors, propagating through each hierarchy. Meanwhile, information has not been fully explored previous To address problems, a Disentangled Diffusion-based Human Pose Estimation with Hierarchical Spatial Temporal Denoiser proposed, termed DDHPose. In our approach: (1) We disentangle diffuse during forward process diffusion model effectively prior. A disentanglement loss proposed supervise learning. (2) For reverse process, we propose (HSTDenoiser) improve modeling joint. Our HSTDenoiser comprises two components: Hierarchical-Related Transformer (HRST) (HRTT). HRST exploits spatial influence parent modeling, while HRTT utilizes both its adjacent joints explore temporal correlations among joints.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....