START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition
DOI: 10.48550/arxiv.2410.16020 Publication Date: 2024-10-21
ABSTRACT
Domain Generalization (DG) aims to enable models generalize unseen target domains by learning from multiple source domains. Existing DG methods primarily rely on convolutional neural networks (CNNs), which inherently learn texture biases due their limited receptive fields, making them prone overfitting While some works have introduced transformer-based (ViTs) for leverage the global field, these incur high computational costs quadratic complexity of self-attention. Recently, advanced state space (SSMs), represented Mamba, shown promising results in supervised tasks achieving linear sequence length during training and fast RNN-like computation inference. Inspired this, we investigate generalization ability Mamba model under domain shifts find that input-dependent matrices within SSMs could accumulate amplify domain-specific features, thus hindering generalization. To address this issue, propose a novel SSM-based architecture with saliency-based token-aware transformation (namely START), achieves state-of-the-art (SOTA) performances offers competitive alternative CNNs ViTs. Our START can selectively perturb suppress features salient tokens SSMs, effectively reducing discrepancy between different Extensive experiments five benchmarks demonstrate outperforms existing SOTA efficient complexity. code is available at https://github.com/lingeringlight/START.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....