Implicit Language Models are RNNs: Balancing Parallelization and Expressivity
Expressivity
DOI:
10.48550/arxiv.2502.07827
Publication Date:
2025-02-10
AUTHORS (6)
ABSTRACT
State-space models (SSMs) and transformers dominate the language modeling landscape. However, they are constrained to a lower computational complexity than classical recurrent neural networks (RNNs), limiting their expressivity. In contrast, RNNs lack parallelization during training, raising fundamental questions about trade off between We propose implicit SSMs, which iterate transformation until convergence fixed point. Theoretically, we show that SSMs implement non-linear state-transitions of RNNs. Empirically, find only approximate fixed-point suffices, enabling design scalable training curriculum largely retains parallelization, with full required for small subset tokens. Our approach demonstrates superior state-tracking capabilities on regular languages, surpassing SSMs. further scale natural reasoning tasks pretraining large-scale up 1.3B parameters 207B tokens - representing, our knowledge, largest model trained date. Notably, outperform explicit counterparts standard benchmarks.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....