Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Discriminative model
DOI: 10.1609/aaai.v34i07.7001 Publication Date: 2020-06-19T08:18:24Z
ABSTRACT
Despite the recent success of deep learning in continuous sign language recognition (CSLR), models typically focus on most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind collaboration different cues (i,e., hand shape, facial expression body posture). By injecting multi-cue into neural network design, we propose a spatial-temporal (STMC) solve vision-based sequence problem. Our STMC consists spatial (SMC) module temporal (TMC) module. The SMC is dedicated representation explicitly decomposes features with aid self-contained pose estimation branch. TMC correlations along two parallel paths, i.e., intra-cue inter-cue, which aims preserve uniqueness explore multiple cues. Finally, design joint optimization strategy achieve end-to-end network. To validate effectiveness, perform experiments three large-scale CSLR benchmarks: PHOENIX-2014, CSL PHOENIX-2014-T. Experimental results demonstrate that proposed method achieves new state-of-the-art performance all benchmarks.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (133)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....