Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning

Closed captioning Leverage (statistics)
DOI: 10.1609/aaai.v33i01.33018965 Publication Date: 2019-08-19T07:46:17Z
ABSTRACT
Although promising results have been achieved in video captioning, existing models are limited to the fixed inventory of activities training corpus, and do not generalize open vocabulary scenarios. Here we introduce a novel task, zeroshot that aims at describing out-of-domain videos unseen activities. Videos different usually require captioning strategies many aspects, i.e. word selection, semantic construction, style expression etc, which poses great challenge depict without paired data. But meanwhile, similar share some those aspects common. Therefore, propose principled Topic-Aware Mixture Experts (TAMoE) model for zero-shot learns compose experts based on topic embeddings, implicitly transferring knowledge learned from seen ones. Besides, leverage external topic-related text corpus construct embedding each activity, embodies most relevant vectors within topic. Empirical only validate effectiveness our method utilizing but also show its strong generalization ability when
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (14)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....