Recurrent Memory Transformer
Sequence (biology)
Memory map
DOI:
10.48550/arxiv.2207.06881
Publication Date:
2022-01-01
AUTHORS (3)
ABSTRACT
Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global local has be stored mostly in the same element-wise Moreover, length of an input is limited by quadratic computational complexity self-attention. In this work, we propose study a memory-augmented segment-level recurrent Transformer (RMT). Memory store process as well pass between segments long with help recurrence. We implement memory mechanism no changes model adding special tokens or output sequence. Then trained control both operations representations processing. Results experiments that RMT performs on par Transformer-XL language modeling for smaller sizes outperforms it tasks require longer Tr-XL able improve its performance. This makes Recurrent promising architecture applications learning long-term dependencies general purpose processing, such algorithmic reasoning.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....