Spike-EVPR: Deep Spiking Residual Network with Cross-Representation Aggregation for Event-Based Visual Place Recognition

Representation
DOI: 10.48550/arxiv.2402.10476 Publication Date: 2024-02-16
ABSTRACT
Event cameras have been successfully applied to visual place recognition (VPR) tasks by using deep artificial neural networks (ANNs) in recent years. However, previously proposed ANN architectures are often unable harness the abundant temporal information presented event streams. In contrast, spiking exhibit more intricate spatiotemporal dynamics and inherently well-suited process sparse asynchronous Unfortunately, directly inputting temporal-dense volumes into network introduces excessive time steps, resulting prohibitively high training costs for large-scale VPR tasks. To address aforementioned issues, we propose a novel architecture called Spike-EVPR event-based First, introduce two representations tailored SNN fully exploit spatio-temporal from streams, reduce video memory occupation during as much possible. Then, full potential of these representations, construct Bifurcated Spike Residual Encoder (BSR-Encoder) with powerful representational capabilities better extract high-level features representations. Next, Shared & Specific Descriptor Extractor (SSD-Extractor). This module is designed shared between specific each. Finally, Cross-Descriptor Aggregation Module (CDA-Module) that fuses above three generate refined, robust global descriptor scene. Our experimental results indicate superior performance our compared several existing EVPR pipelines on Brisbane-Event-VPR DDD20 datasets, average Recall@1 increased 7.61% Brisbane 13.20% DDD20.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....