NFDI4DS | UHH-SEMS - Publication Details

Théo Gigant

ORCID: 0009-0003-6392-8519

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5090317088

Research Areas

Topic Modeling
Natural Language Processing Techniques
Semantic Web and Ontologies
Biomedical Text Mining and Ontologies
Artificial Intelligence in Healthcare and Education
Video Analysis and Summarization
Music and Audio Processing

Laboratoire des signaux et systèmes
2024

CentraleSupélec
2023

Université Paris-Saclay
2023

Centre National de la Recherche Scientifique
2023

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

OPENALEX - Publications

Jason Fries Leon Weber Natasha Seelam Gabriel Altay Debajyoti Datta and 38 more

Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections curated data with clear provenance. Natural prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity novel pretraining tasks, highlighting benefits meta-dataset curation. While successful in general-domain text, translating these data-centric approaches biomedical modeling remains challenging, as labeled...

10.48550/arxiv.2206.15076 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01

Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

OPENALEX - Publications

Théo Gigant Camille Guinaudeau Marc Décombas Frédéric Dufaux

10.18653/v1/2024.emnlp-main.1078 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records

OPENALEX - Publications

Théo Gigant Frédéric Dufaux Camille Guinaudeau Marc Décombas

Large language models and multimodal language-vision give impressive results on current available summarization benchmarks, but are not designed to handle long documents. Most datasets composed of either mono-modal documents or short In order develop for understanding summarizing real-world videoconference records that typically around 1 hour long, we propose a dataset 9,103 extracted from the German National Library Science Technology (TIB) archive, along with their abstract. Additionally,...

10.1145/3617233.3617238 article EN 2023-09-20

Coming Soon ...