wav2vec-C: A Self-Supervised Model for Speech Representation Learning

Representation
DOI: 10.21437/interspeech.2021-717 Publication Date: 2021-08-27T05:59:39Z
ABSTRACT
Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE.Our model learns to reproduce quantized representations partially masked speech encoding using contrastive loss in way similar 2.0.However, the quantization process is regularized by an additional consistency network that reconstruct input features VQ-VAE model.The proposed self-supervised trained on 10k hours of unlabeled data subsequently used as encoder RNN-T ASR fine-tuned with 1k labeled data.This work one very few studies selfsupervised tasks large volume real far-field data.The wav2vec-C encoded achieve, average, twice error reduction over baseline higher codebook utilization comparison 2.0.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (20)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....