Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner

Planner
DOI: 10.48550/arxiv.2409.19949 Publication Date: 2024-09-30
ABSTRACT
Diffusion models have demonstrated their capabilities in modeling trajectories of multi-tasks. However, existing multi-task planners or policies typically rely on task-specific demonstrations via imitation, require reward labels to facilitate policy optimization Reinforcement Learning (RL). To address these challenges, we aim develop a versatile diffusion planner that can leverage large-scale inferior data contains task-agnostic sub-optimal trajectories, with the ability fast adapt specific tasks. In this paper, propose \textbf{SODP}, two-stage framework leverages \textbf{S}ub-\textbf{O}ptimal learn \textbf{D}iffusion \textbf{P}lanner, which is generalizable for various downstream Specifically, pre-training stage, train foundation extracts general planning by distribution be and has wide coverage. Then tasks, adopt RL-based fine-tuning rewards refine planner, aims generate action sequences higher returns. Experimental results from domains including Meta-World Adroit demonstrate SODP outperforms state-of-the-art methods only small amount reward-guided fine-tuning.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....