Transformers Can Do Arithmetic with the Right Embeddings

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2405.17399 Publication Date: 2024-05-27
ABSTRACT
The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability keep track the exact position each digit inside a span digits. We mend this problem by adding an embedding that encodes its relative start number. In addition boost these embeddings provide own, we show fix enables architectural modifications such as input injection and recurrent layers improve even further. With positions resolved, can study logical extrapolation ability transformers. Can they solve problems are larger more complex than those training data? find only 20 numbers with single GPU for one day, reach state-of-the-art performance, achieving up 99% accuracy 100 problems. Finally, gains numeracy also unlock improvements other multi-step reasoning including sorting multiplication.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....