Mickaël Rouvier

ORCID: 0000-0003-3541-3385
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Natural Language Processing Techniques
  • Music and Audio Processing
  • Speech and Audio Processing
  • Topic Modeling
  • Speech and dialogue systems
  • Video Analysis and Summarization
  • Text Readability and Simplification
  • Sentiment Analysis and Opinion Mining
  • Advanced Text Analysis Techniques
  • Advanced Data Compression Techniques
  • Digital Media Forensic Detection
  • Biomedical Text Mining and Ontologies
  • linguistics and terminology studies
  • Advanced Wireless Communication Techniques
  • Authorship Attribution and Profiling
  • Neural Networks and Applications
  • Rough Sets and Fuzzy Logic
  • Artificial Intelligence in Healthcare and Education
  • Image Retrieval and Classification Techniques
  • Advanced Image and Video Retrieval Techniques
  • Fault Detection and Control Systems
  • Complex Network Analysis Techniques
  • Industrial Technology and Control Systems
  • Face recognition and analysis

Laboratoire Informatique d'Avignon
2015-2024

Université d'Avignon et des Pays de Vaucluse
2008-2023

Le Mans Université
2012-2019

Aix-Marseille Université
2014-2016

Centre National de la Recherche Scientifique
2014-2016

Laboratoire d’Informatique Fondamentale de Marseille
2013-2015

Université Nantes Angers Le Mans
2012-2013

Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential applications across specialized domains such as healthcare and medicine. Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose to medical domain presents significant challenges. In this paper, we introduce BioMistral, an LLM biomedical domain, utilizing Mistral its foundation model further pre-trained on PubMed Central. We conduct a...

10.18653/v1/2024.findings-acl.348 preprint EN Findings of the Association for Computational Linguistics: ACL 2022 2024-01-01

This paper presents the LIUM open-source speaker diarization toolbox, mostly dedicated to broadcast news.This tool includes both Hierarchical Agglomerative Clustering using well-known measures such as BIC and CLR, new ILP clustering algorithm i-vectors.Diarization systems are tested on French evaluation data from ESTER, ETAPE REPERE campaigns.

10.21437/interspeech.2013-383 article EN Interspeech 2022 2013-08-25

This paper describes the system developed at LIF for SemEval-2016 evaluation campaign.The goal of Task 4.A was to identify sentiment polarity in tweets.The extends Convolutional Neural Networks (CNN) state art approach.We initialize input representations with embeddings trained on different units: lexical, partof-speech, and embeddings.Neural networks each space are separately, then extracted from their hidden layers concatenated as a fusion neural network.The ranked 2nd obtained an average F1 63.0%.

10.18653/v1/s16-1030 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2016-01-01

Automatic Speech Recognition (ASR) transcription errors are commonly assessed using metrics that compare them with a reference transcription, such as Word Error Rate (WER), which measures spelling deviations from the reference, or semantic score-based metrics. However, these approaches often overlook what is understandable to humans when interpreting errors. To address this limitation, new evaluation proposed categorizes into four levels of severity, further divided subtypes, based on...

10.48550/arxiv.2501.10879 preprint EN arXiv (Cornell University) 2025-01-18

Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.acl-long.896 article EN cc-by 2023-01-01

Current speaker recognition systems, that are learned by using wide training datasets and include sophisticated modelings, turn out to be very specific, providing sometimes disappointing results in real-life applications.Any shift between test data, terms of device, language, duration, noise or other tends degrade accuracy detection.This study investigates unsupervised domain adaptation,when only a scarce unlabeled "in-domain" development dataset is available.Details relevance different...

10.21437/interspeech.2019-1524 article EN Interspeech 2022 2019-09-13

We propose to study speaker diarization from a collection of audio documents. The goal is detect speakers appearing in several shows. In our approach, each show the processed separately before being collectively , group involved Two clustering methods are studied for overall processing collection: one uses NCLR metric and other inspired by techniques based on i-vectors, mainly used verification field. Both were evaluated whole training corpus ESTER 2. method use i-vectors achieves error...

10.21437/interspeech.2012-580 article EN Interspeech 2022 2012-09-09

This paper proposes to learn a set of high-level feature representations through deep learning, referred as Speaker Embeddings, for speaker diarization. Embedding features are taken from the hidden layer neuron activations Deep Neural Networks (DNN), when learned classifiers recognize thousand identities in training set. Although identification, embeddings shown be effective verification particular speakers unseen In particular, this approach is applied Experiments, conducted on corpus...

10.1109/eusipco.2015.7362751 preprint EN 2015-08-01

The I4U consortium was established to facilitate a joint entry NIST speaker recognition evaluations (SRE). latest edition of such submission in SRE 2018, which the among best-performing systems. SRE'18 also marks 10-year anniversary into series evaluation. primary objective current paper is summarize results and lessons learned based on twelve sub-systems their fusion submitted SRE'18. It our intention present shared view advancements, progresses, major paradigm shifts that we have witnessed...

10.21437/interspeech.2019-1533 preprint EN Interspeech 2022 2019-09-13

In this paper we present two datasets for Tamasheq, a developing language mainly spoken in Mali and Niger. These were made available the IWSLT 2022 low-resource speech translation track, they consist of collections radio recordings from daily broadcast news Niger (Studio Kalangou) Tamani). We share (i) massive amount unlabeled audio data (671 hours) five languages: French Niger, Fulfulde, Hausa, Tamasheq Zarma, (ii) smaller 17 hours parallel corpus with utterance-level translations language....

10.48550/arxiv.2201.05051 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU). We employ recently-introduced SAMU-XLSR model, which is designed to generate a single embedding that captures semantics at utterance level, semantically aligned across different languages. This model combines acoustic frame-level representation learning (XLS-R) with Language Agnostic BERT Sentence Embedding (LaBSE) model. show instead initial XLS-R improves...

10.1109/slt54892.2023.10023013 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2023-01-09

Abstract In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural processing (NLP) tasks. While first were trained general domain data, specialized ones have emerged to more effectively treat specific domains. this paper, we propose an original study PLMs in medical French language. We compare, for time, both public data from web and private healthcare establishments. also evaluate different learning strategies set biomedical particular,...

10.1101/2023.04.03.535368 preprint EN public-domain bioRxiv (Cold Spring Harbor Laboratory) 2023-04-05

This paper describes the system developed at LIA for SemEval-2017 evaluation campaign. The goal of Task 4.A was to identify sentiment polarity in tweets. is an ensemble Deep Neural Network (DNN) models: Convolutional (CNN) and Recurrent Long Short-Term Memory (RNN-LSTM). We initialize input representation DNN with different sets embeddings trained on large datasets. DNNs are combined using a score-level fusion approach. ranked 2nd obtained average recall 67.6%.

10.18653/v1/s17-2128 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2017-01-01

Currently there are a lot of algorithms for video summarization; however most them only represent visual information. In this paper, we propose two approaches the construction summary using both and text. One approach focuses on static summaries, where is set selected keyframes keywords, to be displayed in fixed area. The second addresses dynamic summaries segments based their textual content compose new sequence predefined duration. Our rely an existing summarization algorithm, Video...

10.1145/2072298.2072068 article EN Proceedings of the 30th ACM International Conference on Multimedia 2011-11-28

We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca on a set of 13 real-world clinical biomedical natural processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc. Our overall results demonstrate that the evaluated LLMs begin to approach performance zero- few-shot scenarios for most tasks, particularly well QA task, even though they have never...

10.48550/arxiv.2307.12114 preprint EN public-domain arXiv (Cornell University) 2023-01-01

This paper presents investigations about the automatic identification of video genre by audio channel analysis. Genre refers to editorial styles such commercials, movies, sports... We propose and evaluate some methods based on both low high level descriptors, in cepstral or time domains, but also analyzing global structure document linguistic contents. Then, proposed features are combined their complementarity is evaluated. On a database composed single-stories web-videos, best audio-only...

10.1109/taslp.2014.2387411 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2015-01-05

This paper describes a multi-modal person recognition system for video broadcast developed participating in the DefiRepere challenge. The main track of this challenge targets identification all persons occurring either audio modality (speakers) or image (faces). is by PERCOL team involving 4 research labs France and was ranked first at 2014 Defi-Repere scientific issue addressed combination information extraction processes improving performance both modalities. In paper, we present strategy...

10.21437/interspeech.2014-146 article EN Interspeech 2022 2014-09-14

ou non, émanant des établissements d'enseignement et de recherche français étrangers, laboratoires publics privés.

10.21437/interspeech.2017-1311 preprint FR Interspeech 2022 2017-08-16

Yanis Labrak, Adrien Bazoge, Richard Dufour, Beatrice Daille, Pierre-Antoine Gourraud, Emmanuel Morin, Mickael Rouvier. Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI). 2022.

10.18653/v1/2022.louhi-1.5 preprint EN cc-by 2022-01-01

Statistical classifiers operate on features that generally include both useful and useless information. These two types of information are difficult to separate in the feature domain. Recently, a new paradigm based Latent Factor Analysis (LFA) proposed model decomposition into usefull components. This method was successfully applied speaker language recognition tasks. In this paper, we study use LFA for video genre classification by using only audio channel. We propose short-term cep-stral...

10.21437/interspeech.2009-336 article EN Interspeech 2022 2009-09-06

Our goal is to automatically identify people in TV news and debates without any predefined dictionary of people. In this paper, we focus on the problem person identification beyond face authentication order improve results not only where detectable. We propose use automatic scene analysis as features for identification. exploit two features: classification (studio report) camera Then, are identified by propagation strategies overlaid names (OCR results) speakers classes specific shots....

10.1109/cbmi.2014.6849829 preprint EN 2014-06-01
Coming Soon ...