EyEar: Learning Audio Synchronized Human Gaze Trajectory Based on Physics-Informed Dynamics
Dynamics
Audio visual
DOI:
10.1609/aaai.v39i2.32133
Publication Date:
2025-04-11T09:39:13Z
AUTHORS (7)
ABSTRACT
Imitating how humans move their gaze in a visual scene is vital research problem for both understanding and psychology, kindling crucial applications such as building alive virtual characters. Previous studies aim to predict trajectories when are free-viewing an image, searching required targets, or looking clues answer questions image. While these tasks focus on visual-centric scenarios, also along with audio signal inputs more common scenarios. To fill this gap, we introduce new task that predicts human synchronized provide dataset containing 20k points from 8 subjects. effectively integrate information simulate the dynamic process of motion, propose novel learning framework called EyEar (Eye moving while Ear listening) based physics-informed dynamics, which considers three key factors gazes: eye inherent motion tendency, vision salient attraction, semantic attraction. We probability density score overcome high individual variability trajectories, thereby improving stabilization optimization reliability evaluation. Experimental results show outperforms all baselines context evaluation metrics, thanks proposed components model.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....