GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation

Leverage (statistics)
DOI: 10.48550/arxiv.2305.17096 Publication Date: 2023-01-01
ABSTRACT
Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation noise accumulation methods, especially during occlusion abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at cost quadratic memory attention. they are susceptible instance features due above-mentioned challenges suffer from cascading effects. The detection rectification such errors remain largely underexplored. To this end, we introduce \textbf{GRAtt-VIS}, \textbf{G}ated \textbf{R}esidual \textbf{Att}ention for \textbf{V}ideo \textbf{I}nstance \textbf{S}egmentation. Firstly, leverage Gumbel-Softmax-based gate detect possible current frame. Next, based activation, rectify degraded its past representation. Such residual configuration alleviates need dedicated continuous stream relevant features. Secondly, propose novel inter-instance interaction using activation as mask self-attention. This masking strategy dynamically restricts unrepresentative queries self-attention preserves vital information long-term tracking. We refer combination Gated Residual Connection Masked Self-Attention \textbf{GRAtt} block, which can easily be integrated into existing propagation-based framework. Further, GRAtt blocks significantly reduce attention overhead simplify dynamic temporal modeling. GRAtt-VIS achieves state-of-the-art performance YouTube-VIS highly challenging OVIS dataset, improving over previous methods. Code is available \url{https://github.com/Tanveer81/GRAttVIS}.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....