NFDI4DS | UHH-SEMS - Publication Details

Titouan Parcollet

ORCID: 0000-0003-0672-1346

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5089505434

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Music and Audio Processing
Neural Networks and Applications
Natural Language Processing Techniques
Speech and dialogue systems
Topic Modeling
Privacy-Preserving Technologies in Data
Model Reduction and Neural Networks
Digital Filter Design and Implementation
Stochastic Gradient Optimization Techniques
Internet Traffic Analysis and Secure E-voting
Mobile Crowdsensing and Crowdsourcing
Advanced Data Compression Techniques
Cell Image Analysis Techniques
Green IT and Sustainability
Image and Video Stabilization
Image Processing Techniques and Applications
Wireless Networks and Protocols
Ferroelectric and Negative Capacitance Devices
Blasting Impact and Analysis
Image and Signal Denoising Methods
Semantic Web and Ontologies
Machine Learning and Algorithms
Emotion and Mood Recognition

Samsung (United Kingdom)
2023-2025

University of Cambridge
2021-2024

Samsung (South Korea)
2024

Laboratoire Informatique d'Avignon
2017-2023

Université d'Avignon et des Pays de Vaucluse
2016-2023

Campo Arqueologico de Mertola
2023

University of Oxford
2020

SpeechBrain: A General-Purpose Speech Toolkit

OPENALEX - Publications

Titouan Parcollet Mirco Ravanelli Peter Plantinga Aku Rouhe Samuele Cornell and 16 more

SpeechBrain is an open-source and all-in-one speech toolkit. It designed to facilitate the research development of neural processing technologies by being simple, flexible, user-friendly, well-documented. This paper describes core architecture support several tasks common interest, allowing users naturally conceive, compare share novel pipelines. achieves competitive or state-of-the-art performance in a wide range benchmarks. also provides training recipes, pretrained models, inference...

10.48550/arxiv.2106.04624 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Flower: A Friendly Federated Learning Research Framework

OPENALEX - Publications

Daniel J. Beutel Taner Topal Akhil Mathur Xinchi Qiu Javier Fernández-Marqués and 6 more

Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn shared prediction model, while keeping their training data on the device, thereby decoupling ability do machine learning from need store in cloud. However, FL is difficult implement realistically, both terms of scale and systems heterogeneity. Although there are number research frameworks available simulate algorithms, they not support study scalable workloads heterogeneous devices. In this...

10.48550/arxiv.2007.14390 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The Pytorch-kaldi Speech Recognition Toolkit

OPENALEX - Publications

Mirco Ravanelli Titouan Parcollet Yoshua Bengio

The availability of open-source software is playing a remarkable role in the popularization speech recognition and deep learning. Kaldi, for instance, nowadays an established framework used to develop state-of-the-art recognizers. PyTorch build neural networks with Python language has recently spawn tremendous interest within machine learning community thanks its simplicity flexibility. PyTorch-Kaldi project aims bridge gap between these popular toolkits, trying inherit efficiency Kaldi...

10.1109/icassp.2019.8683713 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

A survey of quaternion neural networks

OPENALEX - Publications

Titouan Parcollet Mohamed Morchid Georges Linarès

10.1007/s10462-019-09752-1 article EN Artificial Intelligence Review 2019-08-16

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

OPENALEX - Publications

Titouan Parcollet Ying Zhang Mohamed Morchid Chiheb Trabelsi Georges Linarès and 2 more

des établissements d'enseignement et de recherche français ou étrangers, laboratoires publics privés.

10.21437/interspeech.2018-1898 preprint FR Interspeech 2022 2018-08-28

Quaternion Convolutional Neural Networks for Heterogeneous Image Processing

OPENALEX - Publications

Titouan Parcollet Mohamed Morchid Georges Linarès

Convolutional neural networks (CNN) have recently achieved state-of-the-art results in various applications. In the case of image recognition, an ideal model has to learn independently training data, both local dependencies between three components (R,G,B) a pixel, and global relations describing edges or shapes, making it efficient with small heterogeneous datasets. Quaternion-valued convolutional (QCNN) solved this problematic by introducing multidimensional algebra CNN. This paper...

10.1109/icassp.2019.8682495 preprint EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Quaternion Recurrent Neural Networks

OPENALEX - Publications

Titouan Parcollet Mirco Ravanelli Mohamed Morchid Georges Linarès Chiheb Trabelsi and 2 more

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due their capability learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that characterized by strong internal dimensions vector. We propose novel quaternion recurrent network (QRNN), alongside with long-short term memory (QLSTM), take into account both external relations...

10.48550/arxiv.1806.04418 preprint EN other-oa arXiv (Cornell University) 2018-01-01

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

OPENALEX - Publications

Solène Evain Ha-Thanh Nguyen Hang Le Marcely Zanon Boito Salima Mdhaffar and 13 more

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these suggest it is possible reduce dependence labeled building efficient systems, their evaluation was mostly made ASR multiple heterogeneous experimental settings (most of them English). This...

10.21437/interspeech.2021-556 article EN Interspeech 2022 2021-08-27

A first look into the carbon footprint of federated learning

OPENALEX - Publications

Xinchi Qiu Titouan Parcollet Daniel J. Beutel Taner Topal Akhil Mathur and 1 more

Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers. In response, alternatives to centralized such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL is starting be deployed at a global scale companies that must adhere new legal demands policies originating from governments social groups advocating for protection. \textit{However, potential impact...

10.48550/arxiv.2102.07627 preprint EN cc-by arXiv (Cornell University) 2021-01-01

LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

OPENALEX - Publications

Titouan Parcollet Ha H. Nguyen Solène Evain Marcely Zanon Boito Adrien Pupier and 17 more

10.1016/j.csl.2024.101622 article EN Computer Speech & Language 2024-02-03

End-to-End Speech Recognition from Federated Acoustic Models

OPENALEX - Publications

Yan Gao Titouan Parcollet Salah Zaiem Javier Fernández-Marqués Pedro P. B. de Gusmão and 2 more

Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has attracted a lot of attention recently. However, the FL scenarios often presented in literature are artificial and fail to capture complexity real systems. In this paper, we construct challenging realistic ASR experimental setup consisting clients with heterogeneous data distributions using French Italian sets CommonVoice dataset, large dataset containing thousands different speakers, acoustic...

10.1109/icassp43922.2022.9747161 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

OPENALEX - Publications

Salah Zaiem Youcef Kemiche Titouan Parcollet Slim Essid Mirco Ravanelli

Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on tasks using only small amounts annotated data.The high number proposed approaches fostered the need and rise extended benchmarks that evaluate their a set downstream exploring various aspects signal.However, while considered been growing, most rely upon single decoding architecture maps frozen SSL representations labels.This work investigates robustness...

10.21437/interspeech.2023-1087 article EN Interspeech 2022 2023-08-14

Towards Early Prediction of Self-Supervised Speech Model Performance

OPENALEX - Publications

Ryan Whetten Lucas Maison Titouan Parcollet Marco Dinarelli Yannick Estève

In Self-Supervised Learning (SSL), pre-training and evaluation are resource intensive. the speech domain, current indicators of quality SSL models during pre-training, such as loss, do not correlate well with downstream performance. Consequently, it is often difficult to gauge final performance in a cost efficient manner pre-training. this work, we propose unsupervised methods that give insights into models, namely, measuring cluster rank embeddings model. Results show measures better than...

10.48550/arxiv.2501.05966 preprint EN arXiv (Cornell University) 2025-01-10

Benchmarking Rotary Position Embeddings for Automatic Speech Recognition

OPENALEX - Publications

Shucong Zhang Titouan Parcollet Rogier van Dalen Sourav Bhattacharya

Rotary Position Embedding (RoPE) encodes relative and absolute positional information in Transformer-based models through rotation matrices applied to input vectors within sequences. While RoPE has demonstrated superior performance compared other embedding technologies natural language processing tasks, its effectiveness speech applications remains understudied. In this work, we conduct a comprehensive evaluation of across diverse automatic recognition (ASR) tasks. Our experimental results...

10.48550/arxiv.2501.06051 preprint EN arXiv (Cornell University) 2025-01-10

Linear Time Complexity Conformers with SummaryMixing for Streaming Speech Recognition

OPENALEX - Publications

Titouan Parcollet Rogier van Dalen Shucong Zhang Sourav Bhattacharya

10.1109/icassp49660.2025.10889844 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

The Energy and Carbon Footprint of Training End-to-End Speech Recognizers

OPENALEX - Publications

Titouan Parcollet Mirco Ravanelli

Deep learning contributes to reaching higher levels of artificial intelligence. Due its pervasive adoption, however, growing concerns on the environmental impact this technology have been raised. In particular, energy consumed at training and inference time by modern neural networks is far from being negligible will increase even further due deployment ever larger models. This work investigates for first carbon cost end-to-end automatic speech recognition (ASR). First, it quantifies amount...

10.21437/interspeech.2021-456 preprint EN Interspeech 2022 2021-08-27

Quaternion Neural Networks for Spoken Language Understanding

OPENALEX - Publications

Titouan Parcollet Mohamed Morchid Pierre-Michel Bousquet Richard Dufour Georges Linarès and 1 more

Machine Learning (ML) techniques have allowed a great performance improvement of different challenging Spoken Language Understanding (SLU) tasks. Among these methods, Neural Networks (NN), or Multilayer Perceptron (MLP), recently received interest from researchers due to their representation capability complex internal structures in low dimensional subspace. However, MLPs employ document representations based on basic word level topic-based features. Therefore, reveal little way statistical...

10.1109/slt.2016.7846290 preprint EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2016-12-01

E2E-SINCNET: Toward Fully End-To-End Speech Recognition

OPENALEX - Publications

Titouan Parcollet Mohamed Morchid Georges Linarès

Modern end-to-end (E2E) Automatic Speech Recognition (ASR) systems rely on Deep Neural Networks (DNN) that are mostly trained handcrafted and pre-computed acoustic features such as Mel-filter-banks or Mel-frequency cepstral coefficients. Nonetheless, despite worse performances, E2E ASR models processing raw waveforms an active research field due to the lossless nature of input signal. In this paper, we propose E2E-SincNet, a novel fully model goes from waveform text transcripts by merging...

10.1109/icassp40776.2020.9053954 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding

OPENALEX - Publications

Titouan Parcollet Rogier van Dalen Shucong Zhang Sourav Bhattacharya

Modern speech processing systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of utterance, slowing down inference and training increasing memory consumption. Cheaper alternatives to for ASR have been developed, but they fail consistently reach same level accuracy. This paper, therefore, proposes a novel linear-time alternative It summarises an utterance mean over vectors all steps. single summary is then combined time-specific...

10.21437/interspeech.2024-40 article EN Interspeech 2022 2024-09-01

CGCNN: Complex Gabor Convolutional Neural Network on Raw Speech

OPENALEX - Publications

‪Paul-Gauthier Noé‬ Titouan Parcollet Mohamed Morchid

Convolutional Neural Networks (CNN) have been used in Automatic Speech Recognition (ASR) to learn representations directly from the raw signal instead of hand-crafted acoustic features, providing a richer and lossless input signal. Recent researches propose inject prior knowledge first convolutional layer by integrating shape impulse responses order increase both interpretability learnt model, its performances. We combine complex Gabor filter with complex-valued deep neural networks replace...

10.1109/icassp40776.2020.9054220 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

On-device Federated Learning with Flower

OPENALEX - Publications

Akhil Mathur Daniel J. Beutel Pedro P. B. de Gusmão Javier Fernández-Marqués Taner Topal and 4 more

Federated Learning (FL) allows edge devices to collaboratively learn a shared prediction model while keeping their training data on the device, thereby decoupling ability do machine learning from need store in cloud. Despite algorithmic advancements FL, support for on-device of FL algorithms remains poor. In this paper, we present an exploration various smartphones and embedded using Flower framework. We also evaluate system costs discuss how quantification could be used design more...

10.48550/arxiv.2104.03042 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Fine-Tuning Strategies for Faster Inference Using Speech Self-Supervised Models: A Comparative Study

OPENALEX - Publications

Salah Zaiem Robin Algayres Titouan Parcollet Slim Essid Mirco Ravanelli

Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance low-resource settings. In this context, it been demonstrated that larger self-supervised feature extractors are crucial for achieving lower downstream ASR error rates. Thus, better might be sanctioned with longer inferences. This article explores different approaches may deployed during the fine-tuning to reduce computations needed SSL encoder, leading faster We adapt a number of...

10.1109/icasspw59220.2023.10193042 article EN 2023-06-04

Coming Soon ...