VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization
Fuse (electrical)
Modality (human–computer interaction)
Concatenation (mathematics)
Autoencoder
Feature (linguistics)
Code (set theory)
DOI:
10.1609/aaai.v35i7.16767
Publication Date:
2022-09-08T18:53:07Z
AUTHORS (6)
ABSTRACT
Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) deal with degraded or missing input are less well studied. In particular, we note that previous towards deep fusion do not perform significantly better than models employing a single modality. We conjecture this is because naive feature space through summation concatenation which take into account different strengths each To address this, propose an end-to-end framework, termed VMLoc, sensor inputs common latent variational Product-of-Experts (PoE) followed by attention-based fusion. Unlike multimodal works directly adapting objective function vanilla auto-encoder, show localization can be accurately estimated unbiased based on importance weighting. Our model extensively evaluated RGB-D datasets prove efficacy our model. The source code available at https://github.com/Zalex97/VMLoc.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (14)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....