Element-Wise Attention Layers: An Option for Optimization
FOS: Computer and information sciences
Computer Science - Machine Learning
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
Machine Learning (cs.LG)
DOI:
10.2139/ssrn.4460607
Publication Date:
2023-05-26T19:19:01Z
AUTHORS (2)
ABSTRACT
The use of Attention Layers has become a trend since the popularization of the Transformer-based models, being the key element for many state-of-the-art models that have been developed through recent years. However, one of the biggest obstacles in implementing these architectures - as well as many others in Deep Learning Field - is the enormous amount of optimizing parameters they possess, which make its use conditioned on the availability of robust hardware. In this paper, it's proposed a new method of attention mechanism that adapts the Dot-Product Attention, which uses matrices multiplications, to become element-wise through the use of arrays multiplications. To test the effectiveness of such approach, two models (one with a VGG-like architecture and one with the proposed method) have been trained in a classification task using Fashion MNIST and CIFAR10 datasets. Each model has been trained for 10 epochs in a single Tesla T4 GPU from Google Colaboratory. The results show that this mechanism allows for an accuracy of 92% of the VGG-like counterpart in Fashion MNIST dataset, while reducing the number of parameters in 97%. For CIFAR10, the accuracy is still equivalent to 60% of the VGG-like counterpart while using 50% less parameters.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....