BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks

Benchmarking Benchmark (surveying) Codebase Code (set theory)
DOI: 10.48550/arxiv.2312.02405 Publication Date: 2023-01-01
ABSTRACT
The MineRL BASALT competition has served to catalyze advances in learning from human feedback through four hard-to-specify tasks Minecraft, such as create and photograph a waterfall. Given the completion of two years competitions, we offer community formalized benchmark Evaluation Demonstrations Dataset (BEDD), which serves resource for algorithm development performance assessment. BEDD consists collection 26 million image-action pairs nearly 14,000 videos players completing Minecraft. It also includes over 3,000 dense pairwise evaluations algorithmic agents. These comparisons serve fixed, preliminary leaderboard evaluating newly-developed algorithms. To enable this comparison, present streamlined codebase benchmarking new algorithms against leaderboard. In addition presenting these datasets, conduct detailed analysis data both datasets guide evaluation. released code are available at https://github.com/minerllabs/basalt-benchmark .
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()