Attention-shift based deep neural network for fine-grained visual categorization

Discriminative model ENCODE Visual processing Feature (linguistics)
DOI: 10.1016/j.patcog.2021.107947 Publication Date: 2021-03-19T03:10:05Z
ABSTRACT
Abstract Fine-grained visual categorization (FGVC) has attracted extensive attention in recent years. The general pipeline of current FGVC techniques is to 1) locate the discriminative regions; 2) extract features from each region independently; and 3) feed the integrated features to a classifier. In this paper, we re-investigate the pipeline from the view of human visual recognition mechanisms. The perceiving of discriminative regions is a temporal processing by the human visual system (HVS) via the attention-shift mechanism. However, the existing independent feature extracting and one-pass feeding strategy ignore the inherent semantic relationships among discriminative regions, and thus is improper to model the attention-shift process properly. Therefore, in this paper, we propose a novel end-to-end FGVC network structure named Attention-Shift based Deep Neural Network (AS-DNN) to locate the discriminative regions automatically and encode the semantic correlations iteratively. AS-DNN consists of two channels: 1) the global perception channel C glb and 2) the attention-shift channel C sft , simulating the global perception and the attention-shift mechanism, respectively. Experimental results show that AS-DNN achieves state-of-the-art performances by outperforming both the CNN-based weakly or strongly-supervised FGVC algorithms on several widely-used fine-grained datasets, and the visualization of attention regions exhibit that the proposed method can locate the discriminative regions robustly in complex backgrounds and postures.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (51)
CITATIONS (16)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....