NFDI4DS | UHH-SEMS - Publication Details

RWKV: Reinventing RNNs for the Transformer Era

Quadratic growth Parallelizable manifold Leverage (statistics)

DOI: 10.48550/arxiv.2305.13048 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (30)

Bo Peng

Eric Alcaide

Quentin Anthony

Alon Albalak

Samuel Arcadinho

Huanqi Cao

Xin Cheng

Michael Chung

Matteo Grella

Kranthi Kiran GV

Xuzheng He

Haowen Hou

Przemysław Kazienko

Jan Kocoń

Jiaming Kong

Bartłomiej Koptyra

H.C.W. Lau

Krishna Sri Ipsit...

Ferdinand Mom

Atsushi Saito

Xiangru Tang

Bolun Wang

Johan S. Wind

Stansilaw Wozniak

Ruichong Zhang

Zhenyuan Zhang

Qihang Zhao

Peng Zhou

Jianguo Zhu

Ruijie Zhu

ABSTRACT

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in requirements struggle to match the same performance as due limitations parallelization scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), combines efficient parallelizable training of transformers inference RNNs. Our approach leverages attention mechanism allows us formulate either Transformer or an RNN, thus parallelizing computations during maintains constant inference. scale our models large 14 billion parameters, by far largest dense RNN ever trained, find RWKV performs on par similarly sized Transformers, suggesting future work can leverage this architecture create more models. This presents significant step towards reconciling trade-offs between efficiency tasks.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

RWKV: Reinventing RNNs for the Transformer Era

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....