Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Computation and Language (cs.CL) Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2404.11870 Publication Date: 2024-04-17
ABSTRACT
We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing new, longer sequences of data. PANM integrates an external memory that uses novel physical addresses pointer manipulation techniques mimic human computer abilities. facilitates assignment, dereference, arithmetic by explicitly using pointers access content. Remarkably, it can learn perform these operations through end-to-end training on sequence data, powering various sequential models. Our experiments demonstrate PANM's exceptional length extrapolating capabilities improved performance in tasks require processing, such as algorithmic reasoning Dyck language recognition. helps Transformer achieve up 100% generalization accuracy compositional learning significantly better results mathematical reasoning, question answering machine translation tasks.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....