mmFAS: Multimodal Face Anti-Spoofing Using Multi-Level Alignment and Switch-Attention Fusion
DOI:
10.1609/aaai.v39i1.31980
Publication Date:
2025-04-11T09:25:29Z
AUTHORS (5)
ABSTRACT
The increasing number of presentation attacks on reliable face matching has raised concerns and garnered attention towards face anti-spoofing (FAS). However, existing methods for FAS modeling commonly fuse multiple visual modalities (e.g., RGB, Depth, and Infrared) in a straightforward manner, disregarding latent feature gaps that can hinder representation learning. To address this challenge, we propose a novel multimodal FAS framework (mmFAS) that focuses on explicit alignment and fusion of latent features across different modalities. Specifically, we develop a multimodal alignment module to alleviate the latent feature gap by using instance-level contrastive learning and class-level matching simultaneously. Further, we explore a new switch-attention based fusion module to automatically aggregate complementary information and control model complexity. To evaluate the anti-spoofing performance more effectively, we adopt a challenging yet meaningful cross-database protocol involving four benchmark multimodal FAS datasets to simulate realworld scenarios. Extensive experimental results demonstrate the effectiveness of mmFAS in improving the accuracy of FAS systems, outperforming 10 representative methods.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....