NFDI4DS | UHH-SEMS - Publication Details

Benoît Favre

ORCID: 0000-0002-9777-4613

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5071505335

Research Areas

Natural Language Processing Techniques
Topic Modeling
Speech and dialogue systems
Speech Recognition and Synthesis
Advanced Text Analysis Techniques
Multimodal Machine Learning Applications
Video Analysis and Summarization
Speech and Audio Processing
Handwritten Text Recognition Techniques
Sentiment Analysis and Opinion Mining
Music and Audio Processing
Text Readability and Simplification
AI in Service Interactions
Linguistics and Discourse Analysis
Face recognition and analysis
Image Retrieval and Classification Techniques
Advanced Image and Video Retrieval Techniques
Language, Metaphor, and Cognition
Authorship Attribution and Profiling
Text and Document Classification Technologies
French Language Learning Methods
Language Development and Disorders
Semantic Web and Ontologies
Language, Discourse, Communication Strategies
Face and Expression Recognition

Aix-Marseille Université
2014-2024

Centre National de la Recherche Scientifique
2012-2024

Laboratoire d’Informatique Fondamentale de Marseille
2011-2024

Laboratoire d’Informatique et Systèmes
2018-2024

Université de Toulon
2019-2024

University of Amsterdam
2022

New York University
2017

Hasso Plattner Institute
2017

University of Potsdam
2017

University of California, San Diego
2017

A scalable global model for summarization

OPENALEX - Publications

Dan Gillick Benoît Favre

We present an Integer Linear Program for exact inference under a maximum coverage model automatic summarization. compare our model, which operates at the sub-sentence or "concept-level, to sentence-level previously solved with ILP. Our scales more efficiently larger problems because it does not require quadratic number of variables address redundancy in pairs selected sentences. also show how include sentence compression ILP formulation, has desirable property performing and selection...

10.3115/1611638.1611640 article EN 2009-01-01

Long story short – Global unsupervised models for keyphrase based meeting summarization

OPENALEX - Publications

Korbinian Riedhammer Benoît Favre Dilek Hakkani‐Tür

10.1016/j.specom.2010.06.002 article EN Speech Communication 2010-06-17

Clusterrank: a graph based method for meeting summarization

OPENALEX - Publications

Nikhil Garg Benoît Favre Korbinian Reidhammer Dilek Hakkani-Tuer

This paper presents an unsupervised, graph based approach for extractive summarization of meetings. Graph methods such as TextRank have been used sentence extraction from news articles. These model text a with sentences nodes and edges on word overlap. A node is then ranked according to its similarity other nodes. The spontaneous speech in meetings leads incomplete, informed high redundancy calls additional measures extract relevant sentences. We propose extension the algorithm that clusters...

10.21437/interspeech.2009-456 article EN Interspeech 2022 2009-09-06

A global optimization framework for meeting summarization

OPENALEX - Publications

Dan Gillick Korbinian Riedhammer Benoît Favre Dilek Hakkani‐Tür

We introduce a model for extractive meeting summarization based on the hypothesis that utterances convey bits of information, or concepts. Using keyphrases as concepts weighted by frequency, and an integer linear program to determine best set utterances, is, covering many possible while satisfying length constraint, we achieve ROUGE scores at least good ROUGE-based oracle derived from human summaries. This brings us critical discussion future summarization.

10.1109/icassp.2009.4960697 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2009-04-01

The CALO Meeting Assistant System

OPENALEX - Publications

Gökhan Tür Andreas Stolcke Lynn Voss Stanley Peters Dilek Hakkani‐Tür and 17 more

The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, is part the larger personal assistant system. This paper presents CALO-MA architecture its speech recognition understanding components, which include real-time offline transcription, dialog act segmentation tagging, topic identification segmentation, question-answer pair identification, action item recognition, decision extraction, summarization.

10.1109/tasl.2009.2038810 article EN IEEE Transactions on Audio Speech and Language Processing 2010-02-16

Toward a Competency-based Approach to Co-designing Technologies with People with Intellectual Disability

OPENALEX - Publications

Andrew A. Bayor Margot Brereton Laurianne Sitbon Bernd Ploderer Filip Birčanin and 2 more

Ability-based design is a useful framework that centralizes the abilities (all users can do) of people with disabilities in approaching assistive technologies. However, although this aspires to support designing all kinds disabilities, it mainly effective supporting those whose be clearly defined and measured, particular, physical sensory attributes ability. As result, ability-based only provides limited guidance intellectual disability, cognitive, physical, sensory, practical vary along...

10.1145/3450355 article EN ACM Transactions on Accessible Computing 2021-06-30

MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations

OPENALEX - Publications

George Giannakopoulos Jeff Kubina John M. Conroy Josef Steinberger Benoît Favre and 3 more

George Giannakopoulos, Jeff Kubina, John Conroy, Josef Steinberger, Benoit Favre, Mijail Kabadjov, Udo Kruschwitz, Massimo Poesio. Proceedings of the 16th Annual Meeting Special Interest Group on Discourse and Dialogue. 2015.

10.18653/v1/w15-4638 preprint EN cc-by 2015-01-01

SENSEI-LIF at SemEval-2016 Task 4: Polarity embedding fusion for robust sentiment analysis

OPENALEX - Publications

Mickaël Rouvier Benoît Favre

This paper describes the system developed at LIF for SemEval-2016 evaluation campaign.The goal of Task 4.A was to identify sentiment polarity in tweets.The extends Convolutional Neural Networks (CNN) state art approach.We initialize input representations with embeddings trained on different units: lexical, partof-speech, and embeddings.Neural networks each space are separately, then extracted from their hidden layers concatenated as a fusion neural network.The ranked 2nd obtained an average F1 63.0%.

10.18653/v1/s16-1030 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2016-01-01

Mapping the Communicative Landscape of Early Child-Caregiver Dialogue

OPENALEX - Publications

Abhishek Agrawal Benoît Favre Abdellah Fourtassi

Early linguistic interaction plays a key role in children's social and cognitive development. However, there is lack of quantitative studies that offer comprehensive insight into the communicative landscape characterizes child-caregiver dialogues.In this study, we apply advanced Natural Language Processing (NLP) techniques to analyze multiple corpora, encompassing data from N = 609 individual children (aged 20 32 months old), over 2,500 conversations around 700k pairs interactive turns. Our...

10.31234/osf.io/pkdq9_v1 preprint EN 2025-01-28

Nosocomial Bacteremia Clinical Significance of a Single Blood Culture Positive for Coagulase-Negative Staphylococci

OPENALEX - Publications

Benoît Favre Stéphane Hugonnet Luci Corrêa Hugo Sax Peter Rohner and 1 more

Abstract Objectives: To describe the epidemiology of nosocomial coagulase-negative staphylococci (CoNS) bacteremia and to evaluate clinical significance a single blood culture positive for CoNS. Design: A 3-year retrospective cohort study based on data prospectively collected through hospital-wide surveillance. Bacteremia was defined according CDC criteria, except that growing CoNS not systematically considered as contaminant. All clinically significant cultures were analysis. Setting: large...

10.1086/502605 article EN Infection Control and Hospital Epidemiology 2005-08-01

Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective

OPENALEX - Publications

Emmanuelle Salin Badreddine Farah Stéphane Ayache Benoît Favre

In recent years, joint text-image embeddings have significantly improved thanks to the development of transformer-based Vision-Language models. Despite these advances, we still need better understand representations produced by those this paper, compare pre-trained and fine-tuned at a vision, language multimodal level. To that end, use set probing tasks evaluate performance state-of-the-art models introduce new datasets specifically for probing. These are carefully designed address range...

10.1609/aaai.v36i10.21375 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Speech segmentation and spoken document processing

OPENALEX - Publications

Mari Ostendorf Benoît Favre Ralph Grishman Dilek Hakkani‐Tür Mary P. Harper and 12 more

Progress in both speech and language processing has spurred efforts to support applications that rely on spoken rather than written input. A key challenge moving from text-based documents such is lacks explicit punctuation formatting, which can be crucial for good performance. This article describes different levels of segmentation, approaches automatically recovering segment boundary locations, experimental results demonstrating impact several tasks. The also show a need optimizing...

10.1109/msp.2008.918023 article EN IEEE Signal Processing Magazine 2008-04-23

Packing the meeting summarization knapsack

OPENALEX - Publications

Korbinian Riedhammer Dan Gillick Benoît Favre Dilek Hakkani‐Tür

Despite considerable work in automatic meeting summarization over the last few years, comparing results remains difficult due to varied task conditions and evaluations. To address this issue, we present a method for determining best possible extractive summary given an evaluation metric like ROUGE. Our oracle system is based on knapsack-packing framework, though NP-Hard, can be solved nearly optimally by genetic algorithm. frame new research meaningful context, suggest presenting our...

10.21437/interspeech.2008-604 article EN Interspeech 2022 2008-09-22

Integrating prosodic features in extractive meeting summarization

OPENALEX - Publications

Shasha Xie Dilek Hakkani‐Tür Benoît Favre Yang Liu

Speech contains additional information than text that can be valuable for automatic speech summarization. In this paper, we evaluate how to effectively use acoustic/prosodic features extractive meeting summarization, and integrate prosodic with lexical structural further improvement. To properly represent features, propose different normalization methods based on speaker, topic, or local context information. Our experimental results show using only the achieve better performance non-prosodic...

10.1109/asru.2009.5373302 article EN 2009-12-01

The CALO meeting speech recognition and understanding system

OPENALEX - Publications

Gökhan Tür Andreas Stolcke L. Voss John Dowding Benoît Favre and 16 more

The CALO Meeting Assistant provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, is part the larger personal assistant system. This paper summarizes CALO-MA architecture its speech recognition understanding components, which include real-time offline transcription, dialog act segmentation tagging, question-answer pair identification, action item recognition, decision extraction, summarization.

10.1109/slt.2008.4777842 preprint EN 2008-12-01

Automatic human utility evaluation of ASR systems: does WER really predict performance?

OPENALEX - Publications

Benoît Favre Kyla Cheung Siavash Kazemian Adam J. Lee Yang Liu and 7 more

We propose an alternative evaluation metric to Word Error Rate (WER) for the decision audit task of meeting recordings, which exemplifies how evaluate speech recognition within a legitimate application context. Using machine learning on initial seed human-subject experimental data, our handily outperforms WER, correlates very poorly with human subjects’ success in finding decisions given ASR transcripts range WERs.

10.21437/interspeech.2013-610 article EN Interspeech 2022 2013-08-25

Concept-based Summarization using Integer Linear Programming: From Concept Pruning to Multiple Optimal Solutions

OPENALEX - Publications

Florian Boudin Hugo Mougard Benoît Favre

In concept-based summarization, sentence selection is modelled as a budgeted maximum coverage problem.As this problem NP-hard, pruning low-weight concepts required for the solver to find optimal solutions efficiently.This work shows that reducing number of in model leads lower ROUGE scores, and more importantly presence multiple solutions.We address these issues by extending provide single solution, eliminate need concept using an approximation algorithm achieves comparable performance exact...

10.18653/v1/d15-1220 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

Robust named entity extraction from large spoken archives

OPENALEX - Publications

Benoît Favre Frédéric Bechet Pascal Nocéra

Traditional approaches to Information Extraction (IE) from speech input simply consist in applying text based methods the output of an Automatic Speech Recognition (ASR) system. If it gives satisfaction with low Word Error Rate (WER) transcripts, we believe that a tighter integration IE and ASR modules can increase performance more difficult conditions. More specifically this paper focuses on robust extraction Named Entities where temporal mismatch between training test corpora occurs. We...

10.3115/1220575.1220637 article EN 2005-01-01

Coming Soon ...