- Natural Language Processing Techniques
- Speech Recognition and Synthesis
- Speech and dialogue systems
- Topic Modeling
- Speech and Audio Processing
- Music and Audio Processing
- Intelligent Tutoring Systems and Adaptive Learning
- Subtitles and Audiovisual Media
- Phonetics and Phonology Research
- Mathematics, Computing, and Information Processing
- French Language Learning Methods
- linguistics and terminology studies
- Privacy-Preserving Technologies in Data
- Semantic Web and Ontologies
- Video Analysis and Summarization
- Categorization, perception, and language
- Educational Tools and Methods
- Experimental Learning in Engineering
- Language, Linguistics, Cultural Analysis
- Lexicography and Language Studies
- Hate Speech and Cyberbullying Detection
- Innovations in Educational Methods
- Geophysical Methods and Applications
Laboratoire Informatique d'Avignon
2020-2024
Le Mans Université
2019-2020
University of Maine School of Law
2019
University of Sfax
2019
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these suggest it is possible reduce dependence labeled building efficient systems, their evaluation was mostly made ASR multiple heterogeneous experimental settings (most of them English). This...
This paper investigates methods to effectively retrieve speaker information from the personalized adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). problem is especially important context of federated learning ASR where a global model learnt on server based updates received multiple clients. We propose an approach analyze AMs footprint so-called Indicator dataset. Using this method, we develop two attack that aim infer identity updated without access actual...
Modern Standard Arabic, as well Arabic dialect languages, are usually written without diacritics. The absence of these marks constitute a real problem in the automatic processing data by NLP tools. Indeed, writing diacritics introduces several types ambiguity. First, word diacratics could have many possible meanings depending on their diacritization. Second, undiacritized surface forms an might 200 readings complexity its morphology [12]. In fact, agglutination property produce that can only...
The widespread of powerful personal devices capable collecting voice their users has opened the opportunity to build speaker adapted speech recognition system (ASR) or participate collaborative learning ASR. In both cases, personalized acoustic models (AM), i.e. fine-tuned AM with specific data, can be built. A question that naturally arises is whether dissemination leak information. this paper, we show it possible retrieve gender speaker, but also his identity, by just exploiting weight...
This paper presents a study on the use of federated learning to train an ASR model based wav2vec 2.0 pre-trained by self supervision. Carried out well-known TED-LIUM 3 dataset, our experiments show that such can obtain, with no language model, word error rate 10.92% official TEDLIUM test set, without sharing any data from different users. We also analyse performance for speakers depending their participation learning. Since was first introduced privacy purposes, we measure its ability...
Recent works showed that end-to-end neural approaches tend to become very popular for spoken language understanding (SLU).Through the term end-to-end, one considers use of a single model optimized extract semantic information directly from speech signal.A major issue such models is lack paired audio and textual data with annotation.In this paper, we propose an approach build in scenario which zero available.Our based on external trained generate sequence vectorial representations text.These...
Antoine Laurent, Souhir Gahbiche, Ha Nguyen, Haroun Elleuch, Fethi Bougares, Thiol, Hugo Riguidel, Salima Mdhaffar, Gaëlle Laperrière, Lucas Maison, Sameer Khurana, Yannick Estève. Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023). 2023.
Hang Le, Florentin Barbier, Ha Nguyen, Natalia Tomashenko, Salima Mdhaffar, Souhir Gabiche Gahbiche, Benjamin Lecouteux, Didier Schwab, Yannick Estève. Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021). 2021.
Speech encoders pretrained through self-supervised learning (SSL) have demonstrated remarkable performance in various downstream tasks, including Spoken Language Understanding (SLU) and Automatic Recognition (ASR). For instance, fine-tuning SSL models for such tasks has shown significant potential, leading to improvements the SOTA across challenging datasets. In contrast existing research, this paper contributes by comparing effectiveness of approaches context (i) low-resource spoken...
Recent works demonstrate that voice assistants do not perform equally well for everyone, but research on demographic robustness of speech technologies is still scarce. This mainly due to the rarity large datasets with controlled tags. paper introduces Sonos Voice Control Bias Assessment Dataset, an open dataset composed assistant requests North American English in music domain (1,038 speakers, 166 hours, 170k audio samples, 9,040 unique labelled transcripts) a diversity (gender, age,...
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly speech processing tasks such as recognition, enhancement, speaker text-to-speech, and much more. It promotes transparency replicability by releasing both the pre-trained models complete "recipes" of code algorithms required for training them. This paper presents 1.0, a significant milestone in evolution toolkit, which now has over 200 recipes speech, audio, language tasks, more than 100 available...
In speaker recognition systems, embeddings lack explicit speaker-related information, posing challenges for interpretability. Recently, a binary representation of speech extracts, where coefficient indicates the presence or absence given voice attribute, has been proposed to overcome this lack. It consists an adaptation x-vector extractor followed by binarisation step. This approach proved its worth in terms explainability, but two shortcomings. Firstly, objective shared attribute modeling...
This paper presents the ongoing conception of a set tools, based on live transcription speech during lectures and designed to instrument traditional as well web conferences or hybrid learning situations. The toolset exploits interactions taking place courses, keeps track them facilitates their reuse both in students' studies future iterations course delivered by teacher. Its goal is help students stay focused teacher's explanations offer greater possibilities interactions. prototype was...
This paper presents a study on the use of federated learning to train an ASR model based wav2vec 2.0 pre-trained by self supervision. Carried out well-known TED-LIUM 3 dataset, our experiments show that such can obtain, with no language model, word error rate 10.92% official test set, without sharing any data from different users. We also analyse performance for speakers depending their participation learning. Since was first introduced privacy purposes, we measure its ability protect...
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing building SSL-equipped French speech technologies. It includes documented, large-scale heterogeneous corpora up to 14,000...