GROOT: Learning to Follow Instructions by Watching Gameplay Videos

Benchmark (surveying)
DOI: 10.48550/arxiv.2310.08235 Publication Date: 2023-01-01
ABSTRACT
We study the problem of building a controller that can follow open-ended instructions in open-world environments. propose to reference videos as instructions, which offer expressive goal specifications while eliminating need for expensive text-gameplay annotations. A new learning framework is derived allow such instruction-following controllers from gameplay producing video instruction encoder induces structured space. implement our agent GROOT simple yet effective encoder-decoder architecture based on causal transformers. evaluate against counterparts and human players proposed Minecraft SkillForge benchmark. The Elo ratings clearly show closing human-machine gap well exhibiting 70% winning rate over best generalist baseline. Qualitative analysis induced space further demonstrates some interesting emergent properties, including composition complex behavior synthesis. project page available at https://craftjarvis-groot.github.io.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....