NFDI4DS | UHH-SEMS - Publication Details

LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba

Occupancy

DOI: 10.1609/aaai.v39i3.32264 Publication Date: 2025-04-11T09:46:50Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Yubo Cui

Zhiheng Li

Jiaqiang Wang

Zheng Fang

ABSTRACT

Vision-based 3D occupancy prediction has become a popular research task due to its versatility and affordability. Nowadays, conventional methods usually project the image-based vision features space learn geometric information through attention mechanism, enabling semantic prediction. However, these works face two main challenges: 1) Limited information. Due lack of in image itself, it is challenging directly predict information, especially large-scale outdoor scenes. 2) Local restricted interaction. quadratic complexity they often use modified local fuse features, resulting fusion. To address problems, this paper, we propose language-assisted network, named LOMA. In proposed vision-language framework, first introduce VL-aware Scene Generator (VSG) module generate language feature scene. By leveraging model, provides implicit knowledge explicit from language. Furthermore, present Tri-plane Fusion Mamba (TFM) block efficiently feature. The not only fuses with global modeling but also avoids too much computation costs. Experiments on SemanticKITTI SSCBench-KITTI360 datasets show that our algorithm achieves new state-of-the-art performances both completion tasks. Our code will be open soon.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (0)

EXTERNAL LINKS

OPENALEX - Publications CROSSREF - Publications

PlumX Metrics

LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....