Active Object Detection with Knowledge Aggregation and Distillation from Large Models

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition
DOI: 10.48550/arxiv.2405.12509 Publication Date: 2024-05-21
ABSTRACT
Accurately detecting active objects undergoing state changes is essential for comprehending human interactions and facilitating decision-making. The existing methods object detection (AOD) primarily rely on visual appearance of the within input, such as in size, shape relationship with hands. However, these can be subtle, posing challenges, particularly scenarios multiple distracting no-change instances same category. We observe that are often result an interaction being performed upon object, thus propose to use informed priors about related plausible (including semantics appearance) provide more reliable cues AOD. Specifically, we a knowledge aggregation procedure integrate aforementioned into oracle queries teacher decoder, offering affordance commonsense locate object. To streamline inference process reduce extra inputs, distillation approach encourages student decoder mimic capabilities using query by replicating its predictions attention. Our proposed framework achieves state-of-the-art performance four datasets, namely Ego4D, Epic-Kitchens, MECCANO, 100DOH, which demonstrates effectiveness our improving
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....