MUSTAN: Multi-scale Temporal Context as Attention for Robust Video Foreground Segmentation
DOI:
10.48550/arxiv.2402.00918
Publication Date:
2024-02-01
AUTHORS (4)
ABSTRACT
Video foreground segmentation (VFS) is an important computer vision task wherein one aims to segment the objects under motion from background. Most of current methods are image-based, i.e., rely only on spatial cues while ignoring cues. Therefore, they tend overfit training data and don't generalize well out-of-domain (OOD) distribution. To solve above problem, prior works exploited several such as optical flow, background subtraction mask, etc. However, having a video with annotations like flow challenging task. In this paper, we utilize temporal information improve OOD performance. challenge lies in how model given interpretable way creates very noticeable difference. We therefore devise strategy that integrates context development VFS. Our approach give rise deep learning architectures, namely MUSTAN1 MUSTAN2 based idea multi-scale attention, aids our models learn better representations beneficial for Further, introduce new dataset, Indoor Surveillance Dataset (ISD) It has multiple frame level binary depth map, instance semantic annotations. ISD can benefit other tasks. validate efficacy architectures compare performance baselines. demonstrate proposed significantly outperform benchmark OOD. addition, improved certain categories due ISD.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....