Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment

Code (set theory) Amendment
DOI: 10.48550/arxiv.2502.13170 Publication Date: 2025-02-17
ABSTRACT
The reasoning abilities are one of the most enigmatic and captivating aspects large language models (LLMs). Numerous studies dedicated to exploring expanding boundaries this capability. However, tasks that embody both recall characteristics often overlooked. In paper, we introduce such a novel task, code reasoning, provide new perspective for LLMs. We summarize three meta-benchmarks based on established forms logical instantiate these into eight specific benchmark tasks. Our testing benchmarks reveals LLMs continue struggle with identifying satisfactory pathways. Additionally, present pathway exploration pipeline inspired by human intricate problem-solving methods. This Reflective Hypothesis Decomposition Amendment (RHDA) consists following iterative steps: (1) Proposing potential hypotheses observations decomposing them; (2) Utilizing tools validate reflection outcomes; (3) Revising hypothesis in light observations. approach effectively mitigates chain collapses arising from forgetting or hallucination issues multi-step resulting performance gains up $3\times$. Finally, expanded applying it simulate complex household real-world scenarios, specifically VirtualHome, enhancing handling failure cases. release our all results at https://github.com/TnTWoW/code_reasoning.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....