NFDI4DS | UHH-SEMS - Publication Details

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

FOS: Computer and information sciences Computer Science - Computation and Language Audio and Speech Processing (eess.AS) 0202 electrical engineering, electronic engineering, information engineering FOS: Electrical engineering, electronic engineering, information engineering 02 engineering and technology Computation and Language (cs.CL) Electrical Engineering and Systems Science - Audio and Speech Processing

DOI: 10.48550/arxiv.2010.05171 Publication Date: 2020-01-01

Abstract Supplemental Material References Cited by

AUTHORS (7)

Wang, Changhan

Tang, Yun

Ma, Xutai

Wu, Anne

Popuri, Sravya

Okhonko, Dmytro

Pino, Juan

ABSTRACT

Post-conference updates (accepted to AACL 2020 Demo)<br/>We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based, Transformer-based as well as Conformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....