VFIMamba: Video Frame Interpolation with State Space Models

Interpolation State-space representation
DOI: 10.48550/arxiv.2407.02315 Publication Date: 2024-07-02
ABSTRACT
Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically long sequence modeling, offering both linear complexity and data-dependent capabilities. In this paper, we propose VFIMamba, a novel method efficient dynamic inter-frame by harnessing the S6 model. Our approach introduces Mixed-SSM Block (MSB), initially rearranges tokens from adjacent an interleaved fashion subsequently applies multi-directional modeling. This design facilitates transmission of information across while upholding complexity. Furthermore, introduce curriculum learning strategy that progressively cultivates proficiency dynamics varying motion magnitudes, fully unleashing potential Experimental findings showcase our attains state-of-the-art performance diverse benchmarks, particularly excelling high-resolution scenarios. particular, X-TEST dataset, VFIMamba demonstrates noteworthy improvement 0.80 dB 4K 0.96 2K frames.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....