NFDI4DS | UHH-SEMS - Publication Details

LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

Self-supervised learning FOS: Computer and information sciences Sound (cs.SD) speech benchmark Computer Science - Computation and Language Computer Science - Artificial Intelligence 2000 MSC: 68T07 [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Computer Science - Sound 004 [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Artificial Intelligence (cs.AI) French language Audio and Speech Processing (eess.AS) 68-04PACS: 89.20.Ff FOS: Electrical engineering, electronic engineering, information engineering dataset 07.05.Mh 2000 MSC: 68T07, 68-04PACS: 89.20.Ff, 07.05.Mh Computation and Language (cs.CL) speech processing Electrical Engineering and Systems Science - Audio and Speech Processing

DOI: 10.1016/j.csl.2024.101622 Publication Date: 2024-02-03T16:24:09Z

Abstract Supplemental Material References Cited by

AUTHORS (22)

Titouan Parcollet

Ha Nguyen

Solène Evain

Marcely Zanon Boito

Adrien Pupier

Salima Mdhaffar

Hang Le

Sina Alisamir

Natalia Tomashenko

Marco Dinarelli

Shucong Zhang

Alexandre Allauzen

Maximin Coavoux

Yannick Estève

Mickael Rouvier

Jerôme Goulian

Benjamin Lecouteux

François Portet

Solange Rossato

Fabien Ringeval

Didier Schwab

Laurent Besacier

ABSTRACT

Published in Computer Science and Language. Preprint allowed<br/>Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training. Overall, the newly introduced models trained on 14,000 hours of French speech outperform multilingual and previous LeBenchmark SSL models across the benchmark but also required up to four times more energy for pre-training.<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (150)

CITATIONS (5)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....