Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble

RGB color model Overfitting Feature (linguistics)
DOI: 10.48550/arxiv.2110.06161 Publication Date: 2021-01-01
ABSTRACT
Sign language is commonly used by deaf or mute people to communicate but requires extensive effort master. It usually performed with the fast yet delicate movement of hand gestures, body posture, and even facial expressions. Current Language Recognition (SLR) methods extract features via deep neural networks suffer overfitting due limited noisy data. Recently, skeleton-based action recognition has attracted increasing attention its subject-invariant background-invariant nature, whereas SLR still under exploration lack annotations. Some researchers have tried use off-line pose trackers obtain keypoints aid in recognizing sign recurrent networks. Nevertheless, none them outperforms RGB-based approaches yet. To this end, we propose a novel Skeleton Aware Multi-modal Framework Global Ensemble Model (GEM) for isolated (SAM-SLR-v2) learn fuse multi-modal feature representations towards higher rate. Specifically, Graph Convolution Network (SL-GCN) model embedded dynamics skeleton Separable Spatial-Temporal (SSTCN) exploit features. The predictions are fused other RGB depth based modalities proposed late-fusion GEM provide global information make faithful prediction. Experiments on three datasets demonstrate that our SAM-SLR-v2 framework exceedingly effective achieves state-of-the-art performance significant margins. Our code will be available at https://github.com/jackyjsy/SAM-SLR-v2
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....