Boosting Deductive Reasoning with Step Signals In RLHF

Boosting
DOI: 10.48550/arxiv.2410.09528 Publication Date: 2024-10-12
ABSTRACT
Logical reasoning is a crucial task for Large Language Models (LLMs), enabling them to tackle complex problems. Among tasks, multi-step poses particular challenge. Grounded in the theory of formal logic, we have developed an automated method, Multi-step Deduction (MuseD), deductive data. MuseD has allowed us create training and testing datasets reasoning. Our generation method enables control over complexity generated instructions, facilitating evaluation models across different difficulty levels. Through RLHF training, our data demonstrated significant improvements logical capabilities both in-domain out-of-domain tasks. Additionally, conducted tests assess abilities various models.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....