NFDI4DS | UHH-SEMS - Publication Details

BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks

Benchmarking Benchmark (surveying) Codebase Code (set theory)

DOI: 10.48550/arxiv.2312.02405 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Stephanie Milani

Anssi Kanervisto

Karolis Ramanauskas

Sander Schulhoff

Brandon Houghton

Rohin Shah

ABSTRACT

The MineRL BASALT competition has served to catalyze advances in learning from human feedback through four hard-to-specify tasks Minecraft, such as create and photograph a waterfall. Given the completion of two years competitions, we offer community formalized benchmark Evaluation Demonstrations Dataset (BEDD), which serves resource for algorithm development performance assessment. BEDD consists collection 26 million image-action pairs nearly 14,000 videos players completing Minecraft. It also includes over 3,000 dense pairwise evaluations algorithmic agents. These comparisons serve fixed, preliminary leaderboard evaluating newly-developed algorithms. To enable this comparison, present streamlined codebase benchmarking new algorithms against leaderboard. In addition presenting these datasets, conduct detailed analysis data both datasets guide evaluation. released code are available at https://github.com/minerllabs/basalt-benchmark .

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....