NFDI4DS | UHH-SEMS - Publication Details

Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2502.05857 Publication Date: 2025-01-01

Abstract Supplemental Material References Cited by

AUTHORS (9)

Chen, Lu

Wang, Yizhou

Tang, Shixiang

Ma, Qianhong

He, Tong

Ouyang, Wanli

Zhou, Xiaowei

Bao, Hujun

Peng, Sida

ABSTRACT

This paper addresses the task of learning an agent model behaving like humans, which can jointly perceive, predict, and act in egocentric worlds. Previous methods usually train separate models for these three abilities, leading to information silos among them, which prevents these abilities from learning from each other and collaborating effectively. In this paper, we propose a joint predictive agent model, named EgoAgent, that simultaneously learns to represent the world, predict future states, and take reasonable actions with a single transformer. EgoAgent unifies the representational spaces of the three abilities by mapping them all into a sequence of continuous tokens. Learnable query tokens are appended to obtain current states, future states, and next actions. With joint supervision, our agent model establishes the internal relationship among these three abilities and effectively mimics the human inference and learning processes. Comprehensive evaluations of EgoAgent covering image classification, egocentric future state prediction, and 3D human motion prediction tasks demonstrate the superiority of our method. The code and trained model will be released for reproducibility.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....