Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Computation and Language
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
Computation and Language (cs.CL)
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.1912.11637
Publication Date:
2019-01-01
AUTHORS (6)
ABSTRACT
Self-attention based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. is able to model long-term dependencies, but it may suffer from extraction irrelevant information context. To tackle problem, we propose novel called \textbf{Explicit Sparse Transformer}. Explicit improve concentration attention on global context through an explicit selection most relevant segments. Extensive experimental results series and computer vision tasks, including neural machine translation, image captioning, modeling, all demonstrate advantages performance. We also show that our proposed sparse method achieves comparable or better than previous method, significantly reduces training testing time. For example, inference speed twice sparsemax model. Code will be available at \url{https://github.com/lancopku/Explicit-Sparse-Transformer}
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....