The CLRS-Text Algorithmic Reasoning Language Benchmark

Benchmark (surveying)
DOI: 10.48550/arxiv.2406.04229 Publication Date: 2024-06-06
ABSTRACT
Eliciting reasoning capabilities from language models (LMs) is a critical direction on the path towards building intelligent systems. Most recent studies dedicated to focus out-of-distribution performance procedurally-generated synthetic benchmarks, bespoke-built evaluate specific skills only. This trend makes results hard transfer across publications, slowing down progress. Three years ago, similar issue was identified and rectified in field of neural algorithmic reasoning, with advent CLRS benchmark. dataset generator comprising graph execution traces classical algorithms Introduction Algorithms textbook. Inspired by this, we propose CLRS-Text -- textual version these traces. Out box, capable procedurally generating trace data for thirty diverse, challenging tasks any desirable input distribution, while offering standard pipeline which additional may be created We fine-tune various LMs as generalist executors this benchmark, validating prior work revealing novel, interesting challenge LM community. Our code available at https://github.com/google-deepmind/clrs/tree/master/clrs/_src/clrs_text.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....