Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification

FOS: Computer and information sciences Computer Science - Machine Learning Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Computer Science - Neural and Evolutionary Computing Neural and Evolutionary Computing (cs.NE) Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2403.18228 Publication Date: 2024-03-26
ABSTRACT
Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial Transformer, whereby Spiking Self-Attention (SSA) is used to achieve both higher accuracy lower computational cost. However, it seems that self-attention not always necessary, especially in sparse spike-form calculation manners. In this paper, we innovatively replace vanilla SSA (using dynamic bases calculating from Query Key) with Fourier Transform, Wavelet their combinations fixed triangular or wavelets bases), based on a key hypothesis of them use set basis functions for information transformation. Hence, Fourier-or-Wavelet-based (FWformer) verified visual classification tasks, including static image event-based video datasets. The FWformer can comparable even accuracies ($0.4\%$-$1.5\%$), running speed ($9\%$-$51\%$ training $19\%$-$70\%$ inference), reduced theoretical energy consumption ($20\%$-$25\%$), GPU memory usage ($4\%$-$26\%$), compared standard spikformer. Our result indicates continuous refinement new Transformers, are inspired either biological discovery (spike-form), theory (Fourier Transform), promising.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....