NFDI4DS | UHH-SEMS - Publication Details

Sparse Sinkhorn Attention

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language 01 natural sciences Computation and Language (cs.CL) 0105 earth and related environmental sciences Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2002.11296 Publication Date: 2020-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Yi Tay

Dara Bahri

Yang Liu

Donald Metzler

Da-Cheng Juan

ABSTRACT

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our is based on differentiable sorting of internal representations. Concretely, we introduce meta network that learns generate latent permutations over sequences. Given sorted sequences, are then able compute quasi-global attention with only local windows, improving the memory efficiency module. To this end, algorithmic innovations such as Causal Balancing SortCut, dynamic sequence truncation tailoring Attention encoding and/or decoding purposes. Via extensive experiments seq2seq sorting, language modeling, pixel-wise image generation, document classification natural inference, demonstrate our competitive vanilla consistently outperforms recently proposed Transformer models Transformers.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Sparse Sinkhorn Attention

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....