Sparse Sinkhorn Attention
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Computation and Language
01 natural sciences
Computation and Language (cs.CL)
0105 earth and related environmental sciences
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2002.11296
Publication Date:
2020-01-01
AUTHORS (5)
ABSTRACT
We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our is based on differentiable sorting of internal representations. Concretely, we introduce meta network that learns generate latent permutations over sequences. Given sorted sequences, are then able compute quasi-global attention with only local windows, improving the memory efficiency module. To this end, algorithmic innovations such as Causal Balancing SortCut, dynamic sequence truncation tailoring Attention encoding and/or decoding purposes. Via extensive experiments seq2seq sorting, language modeling, pixel-wise image generation, document classification natural inference, demonstrate our competitive vanilla consistently outperforms recently proposed Transformer models Transformers.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....