Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

Representation Code (set theory)
DOI: 10.48550/arxiv.2207.01622 Publication Date: 2022-01-01
ABSTRACT
In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR). Especially, exploit the recently released dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from dataset, objective, development set. Based on above three designs, develop pretrained model that is able transfer its egocentric video-text representation or video-only several video downstream tasks. Our achieves 10.46R@1&IoU @0.3 NLQ, 10.33 mAP MQ, 74% Acc OSCC, 0.67 sec error PNR. The code available at https://github.com/showlab/EgoVLP.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....