- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Neural Networks and Applications
- Natural Language Processing Techniques
- Speech and dialogue systems
- Topic Modeling
- Privacy-Preserving Technologies in Data
- Model Reduction and Neural Networks
- Digital Filter Design and Implementation
- Stochastic Gradient Optimization Techniques
- Internet Traffic Analysis and Secure E-voting
- Mobile Crowdsensing and Crowdsourcing
- Advanced Data Compression Techniques
- Cell Image Analysis Techniques
- Green IT and Sustainability
- Image and Video Stabilization
- Image Processing Techniques and Applications
- Wireless Networks and Protocols
- Ferroelectric and Negative Capacitance Devices
- Blasting Impact and Analysis
- Image and Signal Denoising Methods
- Semantic Web and Ontologies
- Machine Learning and Algorithms
- Emotion and Mood Recognition
Samsung (United Kingdom)
2023-2025
University of Cambridge
2021-2024
Samsung (South Korea)
2024
Laboratoire Informatique d'Avignon
2017-2023
Université d'Avignon et des Pays de Vaucluse
2016-2023
Campo Arqueologico de Mertola
2023
University of Oxford
2020
SpeechBrain is an open-source and all-in-one speech toolkit. It designed to facilitate the research development of neural processing technologies by being simple, flexible, user-friendly, well-documented. This paper describes core architecture support several tasks common interest, allowing users naturally conceive, compare share novel pipelines. achieves competitive or state-of-the-art performance in a wide range benchmarks. also provides training recipes, pretrained models, inference...
Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn shared prediction model, while keeping their training data on the device, thereby decoupling ability do machine learning from need store in cloud. However, FL is difficult implement realistically, both terms of scale and systems heterogeneity. Although there are number research frameworks available simulate algorithms, they not support study scalable workloads heterogeneous devices. In this...
The availability of open-source software is playing a remarkable role in the popularization speech recognition and deep learning. Kaldi, for instance, nowadays an established framework used to develop state-of-the-art recognizers. PyTorch build neural networks with Python language has recently spawn tremendous interest within machine learning community thanks its simplicity flexibility. PyTorch-Kaldi project aims bridge gap between these popular toolkits, trying inherit efficiency Kaldi...
des établissements d'enseignement et de recherche français ou étrangers, laboratoires publics privés.
Convolutional neural networks (CNN) have recently achieved state-of-the-art results in various applications. In the case of image recognition, an ideal model has to learn independently training data, both local dependencies between three components (R,G,B) a pixel, and global relations describing edges or shapes, making it efficient with small heterogeneous datasets. Quaternion-valued convolutional (QCNN) solved this problematic by introducing multidimensional algebra CNN. This paper...
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due their capability learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that characterized by strong internal dimensions vector. We propose novel quaternion recurrent network (QRNN), alongside with long-short term memory (QLSTM), take into account both external relations...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these suggest it is possible reduce dependence labeled building efficient systems, their evaluation was mostly made ASR multiple heterogeneous experimental settings (most of them English). This...
Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers. In response, alternatives to centralized such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL is starting be deployed at a global scale companies that must adhere new legal demands policies originating from governments social groups advocating for protection. \textit{However, potential impact...
Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has attracted a lot of attention recently. However, the FL scenarios often presented in literature are artificial and fail to capture complexity real systems. In this paper, we construct challenging realistic ASR experimental setup consisting clients with heterogeneous data distributions using French Italian sets CommonVoice dataset, large dataset containing thousands different speakers, acoustic...
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on tasks using only small amounts annotated data.The high number proposed approaches fostered the need and rise extended benchmarks that evaluate their a set downstream exploring various aspects signal.However, while considered been growing, most rely upon single decoding architecture maps frozen SSL representations labels.This work investigates robustness...
In Self-Supervised Learning (SSL), pre-training and evaluation are resource intensive. the speech domain, current indicators of quality SSL models during pre-training, such as loss, do not correlate well with downstream performance. Consequently, it is often difficult to gauge final performance in a cost efficient manner pre-training. this work, we propose unsupervised methods that give insights into models, namely, measuring cluster rank embeddings model. Results show measures better than...
Rotary Position Embedding (RoPE) encodes relative and absolute positional information in Transformer-based models through rotation matrices applied to input vectors within sequences. While RoPE has demonstrated superior performance compared other embedding technologies natural language processing tasks, its effectiveness speech applications remains understudied. In this work, we conduct a comprehensive evaluation of across diverse automatic recognition (ASR) tasks. Our experimental results...
Deep learning contributes to reaching higher levels of artificial intelligence. Due its pervasive adoption, however, growing concerns on the environmental impact this technology have been raised. In particular, energy consumed at training and inference time by modern neural networks is far from being negligible will increase even further due deployment ever larger models. This work investigates for first carbon cost end-to-end automatic speech recognition (ASR). First, it quantifies amount...
Machine Learning (ML) techniques have allowed a great performance improvement of different challenging Spoken Language Understanding (SLU) tasks. Among these methods, Neural Networks (NN), or Multilayer Perceptron (MLP), recently received interest from researchers due to their representation capability complex internal structures in low dimensional subspace. However, MLPs employ document representations based on basic word level topic-based features. Therefore, reveal little way statistical...
Modern end-to-end (E2E) Automatic Speech Recognition (ASR) systems rely on Deep Neural Networks (DNN) that are mostly trained handcrafted and pre-computed acoustic features such as Mel-filter-banks or Mel-frequency cepstral coefficients. Nonetheless, despite worse performances, E2E ASR models processing raw waveforms an active research field due to the lossless nature of input signal. In this paper, we propose E2E-SincNet, a novel fully model goes from waveform text transcripts by merging...
Modern speech processing systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of utterance, slowing down inference and training increasing memory consumption. Cheaper alternatives to for ASR have been developed, but they fail consistently reach same level accuracy. This paper, therefore, proposes a novel linear-time alternative It summarises an utterance mean over vectors all steps. single summary is then combined time-specific...
Convolutional Neural Networks (CNN) have been used in Automatic Speech Recognition (ASR) to learn representations directly from the raw signal instead of hand-crafted acoustic features, providing a richer and lossless input signal. Recent researches propose inject prior knowledge first convolutional layer by integrating shape impulse responses order increase both interpretability learnt model, its performances. We combine complex Gabor filter with complex-valued deep neural networks replace...
Federated Learning (FL) allows edge devices to collaboratively learn a shared prediction model while keeping their training data on the device, thereby decoupling ability do machine learning from need store in cloud. Despite algorithmic advancements FL, support for on-device of FL algorithms remains poor. In this paper, we present an exploration various smartphones and embedded using Flower framework. We also evaluate system costs discuss how quantification could be used design more...
Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance low-resource settings. In this context, it been demonstrated that larger self-supervised feature extractors are crucial for achieving lower downstream ASR error rates. Thus, better might be sanctioned with longer inferences. This article explores different approaches may deployed during the fine-tuning to reduce computations needed SSL encoder, leading faster We adapt a number of...