Titouan Parcollet

ORCID: 0000-0003-0672-1346
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • Neural Networks and Applications
  • Natural Language Processing Techniques
  • Speech and dialogue systems
  • Topic Modeling
  • Privacy-Preserving Technologies in Data
  • Model Reduction and Neural Networks
  • Digital Filter Design and Implementation
  • Stochastic Gradient Optimization Techniques
  • Internet Traffic Analysis and Secure E-voting
  • Mobile Crowdsensing and Crowdsourcing
  • Advanced Data Compression Techniques
  • Cell Image Analysis Techniques
  • Green IT and Sustainability
  • Image and Video Stabilization
  • Image Processing Techniques and Applications
  • Wireless Networks and Protocols
  • Ferroelectric and Negative Capacitance Devices
  • Blasting Impact and Analysis
  • Image and Signal Denoising Methods
  • Semantic Web and Ontologies
  • Machine Learning and Algorithms
  • Emotion and Mood Recognition

Samsung (United Kingdom)
2023-2025

University of Cambridge
2021-2024

Samsung (South Korea)
2024

Laboratoire Informatique d'Avignon
2017-2023

Université d'Avignon et des Pays de Vaucluse
2016-2023

Campo Arqueologico de Mertola
2023

University of Oxford
2020

SpeechBrain is an open-source and all-in-one speech toolkit. It designed to facilitate the research development of neural processing technologies by being simple, flexible, user-friendly, well-documented. This paper describes core architecture support several tasks common interest, allowing users naturally conceive, compare share novel pipelines. achieves competitive or state-of-the-art performance in a wide range benchmarks. also provides training recipes, pretrained models, inference...

10.48550/arxiv.2106.04624 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn shared prediction model, while keeping their training data on the device, thereby decoupling ability do machine learning from need store in cloud. However, FL is difficult implement realistically, both terms of scale and systems heterogeneity. Although there are number research frameworks available simulate algorithms, they not support study scalable workloads heterogeneous devices. In this...

10.48550/arxiv.2007.14390 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The availability of open-source software is playing a remarkable role in the popularization speech recognition and deep learning. Kaldi, for instance, nowadays an established framework used to develop state-of-the-art recognizers. PyTorch build neural networks with Python language has recently spawn tremendous interest within machine learning community thanks its simplicity flexibility. PyTorch-Kaldi project aims bridge gap between these popular toolkits, trying inherit efficiency Kaldi...

10.1109/icassp.2019.8683713 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

10.1007/s10462-019-09752-1 article EN Artificial Intelligence Review 2019-08-16

des établissements d'enseignement et de recherche français ou étrangers, laboratoires publics privés.

10.21437/interspeech.2018-1898 preprint FR Interspeech 2022 2018-08-28

Convolutional neural networks (CNN) have recently achieved state-of-the-art results in various applications. In the case of image recognition, an ideal model has to learn independently training data, both local dependencies between three components (R,G,B) a pixel, and global relations describing edges or shapes, making it efficient with small heterogeneous datasets. Quaternion-valued convolutional (QCNN) solved this problematic by introducing multidimensional algebra CNN. This paper...

10.1109/icassp.2019.8682495 preprint EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due their capability learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that characterized by strong internal dimensions vector. We propose novel quaternion recurrent network (QRNN), alongside with long-short term memory (QLSTM), take into account both external relations...

10.48550/arxiv.1806.04418 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these suggest it is possible reduce dependence labeled building efficient systems, their evaluation was mostly made ASR multiple heterogeneous experimental settings (most of them English). This...

10.21437/interspeech.2021-556 article EN Interspeech 2022 2021-08-27

Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers. In response, alternatives to centralized such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL is starting be deployed at a global scale companies that must adhere new legal demands policies originating from governments social groups advocating for protection. \textit{However, potential impact...

10.48550/arxiv.2102.07627 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has attracted a lot of attention recently. However, the FL scenarios often presented in literature are artificial and fail to capture complexity real systems. In this paper, we construct challenging realistic ASR experimental setup consisting clients with heterogeneous data distributions using French Italian sets CommonVoice dataset, large dataset containing thousands different speakers, acoustic...

10.1109/icassp43922.2022.9747161 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on tasks using only small amounts annotated data.The high number proposed approaches fostered the need and rise extended benchmarks that evaluate their a set downstream exploring various aspects signal.However, while considered been growing, most rely upon single decoding architecture maps frozen SSL representations labels.This work investigates robustness...

10.21437/interspeech.2023-1087 article EN Interspeech 2022 2023-08-14

In Self-Supervised Learning (SSL), pre-training and evaluation are resource intensive. the speech domain, current indicators of quality SSL models during pre-training, such as loss, do not correlate well with downstream performance. Consequently, it is often difficult to gauge final performance in a cost efficient manner pre-training. this work, we propose unsupervised methods that give insights into models, namely, measuring cluster rank embeddings model. Results show measures better than...

10.48550/arxiv.2501.05966 preprint EN arXiv (Cornell University) 2025-01-10

Rotary Position Embedding (RoPE) encodes relative and absolute positional information in Transformer-based models through rotation matrices applied to input vectors within sequences. While RoPE has demonstrated superior performance compared other embedding technologies natural language processing tasks, its effectiveness speech applications remains understudied. In this work, we conduct a comprehensive evaluation of across diverse automatic recognition (ASR) tasks. Our experimental results...

10.48550/arxiv.2501.06051 preprint EN arXiv (Cornell University) 2025-01-10

10.1109/icassp49660.2025.10889844 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Deep learning contributes to reaching higher levels of artificial intelligence. Due its pervasive adoption, however, growing concerns on the environmental impact this technology have been raised. In particular, energy consumed at training and inference time by modern neural networks is far from being negligible will increase even further due deployment ever larger models. This work investigates for first carbon cost end-to-end automatic speech recognition (ASR). First, it quantifies amount...

10.21437/interspeech.2021-456 preprint EN Interspeech 2022 2021-08-27

Machine Learning (ML) techniques have allowed a great performance improvement of different challenging Spoken Language Understanding (SLU) tasks. Among these methods, Neural Networks (NN), or Multilayer Perceptron (MLP), recently received interest from researchers due to their representation capability complex internal structures in low dimensional subspace. However, MLPs employ document representations based on basic word level topic-based features. Therefore, reveal little way statistical...

10.1109/slt.2016.7846290 preprint EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2016-12-01

Modern end-to-end (E2E) Automatic Speech Recognition (ASR) systems rely on Deep Neural Networks (DNN) that are mostly trained handcrafted and pre-computed acoustic features such as Mel-filter-banks or Mel-frequency cepstral coefficients. Nonetheless, despite worse performances, E2E ASR models processing raw waveforms an active research field due to the lossless nature of input signal. In this paper, we propose E2E-SincNet, a novel fully model goes from waveform text transcripts by merging...

10.1109/icassp40776.2020.9053954 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Modern speech processing systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of utterance, slowing down inference and training increasing memory consumption. Cheaper alternatives to for ASR have been developed, but they fail consistently reach same level accuracy. This paper, therefore, proposes a novel linear-time alternative It summarises an utterance mean over vectors all steps. single summary is then combined time-specific...

10.21437/interspeech.2024-40 article EN Interspeech 2022 2024-09-01

Convolutional Neural Networks (CNN) have been used in Automatic Speech Recognition (ASR) to learn representations directly from the raw signal instead of hand-crafted acoustic features, providing a richer and lossless input signal. Recent researches propose inject prior knowledge first convolutional layer by integrating shape impulse responses order increase both interpretability learnt model, its performances. We combine complex Gabor filter with complex-valued deep neural networks replace...

10.1109/icassp40776.2020.9054220 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Federated Learning (FL) allows edge devices to collaboratively learn a shared prediction model while keeping their training data on the device, thereby decoupling ability do machine learning from need store in cloud. Despite algorithmic advancements FL, support for on-device of FL algorithms remains poor. In this paper, we present an exploration various smartphones and embedded using Flower framework. We also evaluate system costs discuss how quantification could be used design more...

10.48550/arxiv.2104.03042 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance low-resource settings. In this context, it been demonstrated that larger self-supervised feature extractors are crucial for achieving lower downstream ASR error rates. Thus, better might be sanctioned with longer inferences. This article explores different approaches may deployed during the fine-tuning to reduce computations needed SSL encoder, leading faster We adapt a number of...

10.1109/icasspw59220.2023.10193042 article EN 2023-06-04
Coming Soon ...