NFDI4DS | UHH-SEMS - Publication Details

Thierry Poibeau

ORCID: 0000-0003-3669-4051

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5000083284

Research Areas

Natural Language Processing Techniques
Topic Modeling
Semantic Web and Ontologies
Linguistics and Discourse Analysis
Advanced Text Analysis Techniques
Speech and dialogue systems
French Language Learning Methods
Biomedical Text Mining and Ontologies
Digital Humanities and Scholarship
Language and cultural evolution
linguistics and terminology studies
Sentiment Analysis and Opinion Mining
Text Readability and Simplification
Historical Linguistics and Language Studies
Web Data Mining and Analysis
Language, Metaphor, and Cognition
Syntax, Semantics, Linguistic Variation
Translation Studies and Practices
Lexicography and Language Studies
Linguistics and language evolution
Linguistics and Cultural Studies
Complex Network Analysis Techniques
Cultural Insights and Digital Impacts
Computational and Text Analysis Methods
Phonetics and Phonology Research

Langues, Textes, Traitements Informatiques, Cognition
2015-2024

École Normale Supérieure - PSL
2009-2023

École Normale Supérieure
2011-2023

Université Sorbonne Nouvelle
2012-2022

Centre National de la Recherche Scientifique
2012-2022

Sorbonne Université
2016-2022

University of Pisa
2022

University of California, Berkeley
2020

Bocconi University
2020

University of Michigan
2020

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

OPENALEX - Publications

Edoardo Maria Ponti Helen O’Horan Yevgeni Berzak Ivan Vulić Roi Reichart and 3 more

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly languages that suffer from lack of human labeled resources. We present an extensive literature survey on use typological information in development NLP techniques. Our demonstrates date, existing databases has resulted consistent but modest improvements system performance. show this...

10.1162/coli_a_00357 article EN cc-by-nc-nd Computational Linguistics 2019-06-25

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity

OPENALEX - Publications

Ivan Vulić Simon Baker Edoardo Maria Ponti Ulla Petti Ira Leviant and 7 more

We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well less-resourced ones Welsh, Kiswahili). Each language set is annotated the relation of semantic similarity contains 1,888 semantically aligned concept pairs, providing representative coverage word classes (nouns, verbs, adjectives, adverbs), frequency ranks, intervals, fields,...

10.1162/coli_a_00391 article EN cc-by-nc-nd Computational Linguistics 2020-10-22

Probing for the Usage of Grammatical Number

OPENALEX - Publications

Karim Lasri Tiago Pimentel Alessandro Lenci Thierry Poibeau Ryan Cotterell

A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious—i.e., the model not rely on it when making predictions. In this paper, we try find an encoding that actually uses, introducing usage-based setup. We first choose behavioral task which cannot solved without using property. Then, attempt remove by intervening model’s contend that, if used model, its removal should harm performance...

10.18653/v1/2022.acl-long.603 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

The First Komi-Zyrian Universal Dependencies Treebanks

OPENALEX - Publications

Niko Partanen Rogier Blokland KyungTae Lim Thierry Poibeau Michael Rießler

Two Komi-Zyrian treebanks were included in the Universal Dependencies 2.2 release. This article contextualizes treebanks, discusses process through which they created, and outlines future plans timeline for next improvements. Special attention is paid to possibilities of using UD documentation description endangered languages.

10.18653/v1/w18-6015 article EN cc-by 2018-01-01

Towards Unrestricted, Large-Scale Acquisition of Feature-Based Conceptual Representations from Corpus Data

OPENALEX - Publications

Barry Devereux Nicholas Pilkington Thierry Poibeau Anna Korhonen

10.1007/s11168-010-9068-8 article EN Research on Language and Computation 2009-12-01

The multilingual named entity recognition framework

OPENALEX - Publications

Thierry Poibeau

This paper presents a multilingual system designed to recognize named entities in wide variety of languages (currently more than 12 are concerned). The includes original strategies deal with encoding character sets, analysis and algorithms process these languages.

10.3115/1067737.1067772 preprint EN 2003-01-01

Event-based information extraction for the biomedical domain

OPENALEX - Publications

Érick Alphonse Mohamed Ould Abdel Vetah Thierry Poibeau Davy Weissenbacher Sophie Aubin and 7 more

This paper gives an overview of the Caderige project. project involves teams from different areas (biology, machine learning, natural language processing) in order to develop highlevel analysis tools for extracting structured information biological bibliographical databases, especially Medline. The approach and compares it state art.

10.3115/1567594.1567602 article EN 2004-01-01

SEx BiST: A Multi-Source Trainable Parser with Deep Contextualized Lexical Representations

OPENALEX - Publications

KyungTae Lim Cheoneum Park Changki Lee Thierry Poibeau

We describe the SEx BiST parser (Semantically EXtended Bi-LSTM parser) developed at Lattice for CoNLL 2018 Shared Task (Multilingual Parsing from Raw Text to Universal Dependencies). The main characteristic of our work is encoding three different modes contextual information parsing: (i) Treebank feature representations, (ii) Multilingual word (iii) ELMo representations obtained via unsupervised learning external resources. Our performed well in official end-to-end evaluation (73.02 LAS –...

10.18653/v1/k18-2014 article EN cc-by Proceedings of the اولین کنفرانس بین المللی پیشرفت های نوین در مهندسی عمران 2018-01-01

Coming Soon ...