VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization

Fuse (electrical) Modality (human–computer interaction) Concatenation (mathematics) Autoencoder Feature (linguistics) Code (set theory)
DOI: 10.1609/aaai.v35i7.16767 Publication Date: 2022-09-08T18:53:07Z
ABSTRACT
Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) deal with degraded or missing input are less well studied. In particular, we note that previous towards deep fusion do not perform significantly better than models employing a single modality. We conjecture this is because naive feature space through summation concatenation which take into account different strengths each To address this, propose an end-to-end framework, termed VMLoc, sensor inputs common latent variational Product-of-Experts (PoE) followed by attention-based fusion. Unlike multimodal works directly adapting objective function vanilla auto-encoder, show localization can be accurately estimated unbiased based on importance weighting. Our model extensively evaluated RGB-D datasets prove efficacy our model. The source code available at https://github.com/Zalex97/VMLoc.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (14)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....