- Topic Modeling
- Natural Language Processing Techniques
- Semantic Web and Ontologies
- Biomedical Text Mining and Ontologies
- Artificial Intelligence in Healthcare and Education
- Video Analysis and Summarization
- Music and Audio Processing
Laboratoire des signaux et systèmes
2024
CentraleSupélec
2023
Université Paris-Saclay
2023
Centre National de la Recherche Scientifique
2023
Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections curated data with clear provenance. Natural prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity novel pretraining tasks, highlighting benefits meta-dataset curation. While successful in general-domain text, translating these data-centric approaches biomedical modeling remains challenging, as labeled...
Large language models and multimodal language-vision give impressive results on current available summarization benchmarks, but are not designed to handle long documents. Most datasets composed of either mono-modal documents or short In order develop for understanding summarizing real-world videoconference records that typically around 1 hour long, we propose a dataset 9,103 extracted from the German National Library Science Technology (TIB) archive, along with their abstract. Additionally,...