Understanding Transformer Reasoning Capabilities via Graph Algorithms
FOS: Computer and information sciences
Computer Science - Machine Learning
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2405.18512
Publication Date:
2024-05-28
AUTHORS (8)
ABSTRACT
Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding their reasoning capabilities in realistic parameter is lacking. We investigate this question terms the network's depth, width, and number extra tokens for algorithm execution. Our novel representational hierarchy separates 9 problems into solvable transformers regimes. prove that logarithmic depth necessary sufficient tasks like graph connectivity, while single-layer with small embedding dimensions can contextual retrieval tasks. also support our analysis ample evidence using GraphQA benchmark. These results show excel at many tasks, even outperforming specialized networks.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....