Understanding Transformer Reasoning Capabilities via Graph Algorithms

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2405.18512 Publication Date: 2024-05-28
ABSTRACT
Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding their reasoning capabilities in realistic parameter is lacking. We investigate this question terms the network's depth, width, and number extra tokens for algorithm execution. Our novel representational hierarchy separates 9 problems into solvable transformers regimes. prove that logarithmic depth necessary sufficient tasks like graph connectivity, while single-layer with small embedding dimensions can contextual retrieval tasks. also support our analysis ample evidence using GraphQA benchmark. These results show excel at many tasks, even outperforming specialized networks.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....