The Blessings of Unlabeled Background in Untrimmed Videos
Smoothing
DOI:
10.48550/arxiv.2103.13183
Publication Date:
2021-01-01
AUTHORS (6)
ABSTRACT
Weakly-supervised Temporal Action Localization (WTAL) aims to detect the action segments with only video-level labels in training. The key challenge is how distinguish of interest from background, which unlabelled even on video-level. While previous works treat background as "curses", we consider it "blessings". Specifically, first use causal analysis point out that common localization errors are due unobserved confounder resides ubiquitously visual recognition. Then, propose a Smoothing PCA-based (TS-PCA) deconfounder, exploits model an observed substitute for confounder, remove confounding effect. Note proposed deconfounder model-agnostic and non-intrusive, hence can be applied any WTAL method without re-designs. Through extensive experiments four state-of-the-art methods, show improve all them public datasets: THUMOS-14 ActivityNet-1.3.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....